Commit Graph

187 Commits

Author SHA1 Message Date
Willy Tarreau
a9ff5e64c1 CLEANUP: stream-int: fix a few outdated comments about stream_int_register_handler()
They were not updated after the infrastructure change.
2015-07-19 18:46:30 +02:00
Willy Tarreau
0b1a4541dc MEDIUM: stream-int: pause the appctx if the task is woken up
If we're going to call the task we don't need to call the appctx anymore
since the task may decide differently in the end and will do the proper
thing using ->update(). This reduces one wake up call per session and
may go down to half in case of high concurrency (scheduling races).
2015-04-23 17:56:17 +02:00
Willy Tarreau
fe127937a8 MEDIUM: applet: make the applets only use si_applet_{cant|want|stop}_{get|put}
The applets don't fiddle with SI_FL_WAIT_ROOM anymore, instead they indicate
what they want, possibly that they failed (eg: WAIT_ROOM), and it's done() /
update() which finally updates the WAIT_* flags according to the channels'
and stream interface's states. This solves the issue of the pauses during a
"show sess" without creating busy loops.
2015-04-23 17:56:17 +02:00
Willy Tarreau
563cc37609 MAJOR: stream: use a regular ->update for all stream interfaces
Now si->update() is used to update any type of stream interface, whether
it's an applet, a connection or even nothing. We don't call si_applet_call()
anymore at the end of the resync and we don't have the risk that the
stream's task is reinserted into the run queue, which makes the code
a bit simpler.

The stream_int_update_applet() function was simplified to ensure that
it remained compatible with this standardized calling convention. It
was almost copy-pasted from the update code dedicated to connections.
Just like for si_applet_done(), it seems that it should be possible to
merge the two functions except that it would require some slow operations,
except maybe if the type of end point is tested inside the update function
itself.
2015-04-23 17:56:16 +02:00
Willy Tarreau
828824af05 MAJOR: applet: now call si_applet_done() instead of si_update() in I/O handlers
The applet I/O handlers now rely on si_applet_done() which itself decides
to wake up or sleep the appctx. Now it becomes critical that applte handlers
properly call this on every exit path so that the appctx is removed from the
active list after I/O have been handled. One such call was added to the Lua
socket handler. It used to work without it probably because the main task is
woken up by the parent task but now it's needed.
2015-04-23 17:56:16 +02:00
Willy Tarreau
e5f8649102 MEDIUM: stream-int: add a new function si_applet_done()
This is the equivalent of si_conn_wake() but for applets. It will be
called after changes to the stream interface are brought by the applet
I/O handler. Ultimately it will release buffers and may be even wake
the stream's task up if some important changes are detected.

It would be nice to be able to merge it with the connection's wake
function since it mostly manipulates the stream interface, but there
are minor differences (such as how to enable/disable polling on a fd
vs applet) and some specificities to applets (eg: don't wake the
applet up until the output is empty) which would require abstract
functions which would slow down everything.
2015-04-23 17:56:16 +02:00
Willy Tarreau
d45b9f8991 REORG: stream-int: create si_applet_ops dedicated to applets
These functions are dedicated to applets so that we don't use the default
ones anymore in this case.
2015-04-23 17:56:16 +02:00
Willy Tarreau
3057645b37 CLEANUP: applet: rename struct si_applet to applet
Since this one does not depend on stream_interface anymore, remove the
"si_" prefix.
2015-04-23 17:56:16 +02:00
Willy Tarreau
8a8d83b85c REORG: applet: move the applet definitions out of stream_interface
We're tidying the definitions so that appctx lives on its own. A new
set of applet.h files has been added for this purpose.
2015-04-23 17:56:16 +02:00
Willy Tarreau
a7513f5d00 MINOR: stream-int: make appctx_new() take the applet in argument
Doing so simplifies the initialization of a new appctx. We don't
need appctx_set_applet() anymore.
2015-04-06 11:37:32 +02:00
Willy Tarreau
87b09668be REORG/MAJOR: session: rename the "session" entity to "stream"
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.

In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.

The files stream.{c,h} were added and session.{c,h} removed.

The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.

Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.

Once all changes are completed, we should see approximately this :

   L7 - http_txn
   L6 - stream
   L5 - session
   L4 - connection | applet

There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.

Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
2015-04-06 11:23:56 +02:00
Willy Tarreau
6b5a9c23ce CLEANUP: stream-int: remove inclusion of fd.h that is not used anymore
That's a historic achievement, stream_interface.c doesn't manipulate
any file descriptor anymore. It only relies on connections or applets.
2015-03-13 00:46:47 +01:00
Willy Tarreau
d85c48589a REORG: connection: move conn_drain() to connection.c and rename it
It's now called conn_sock_drain() to make it clear that it only reads
at the sock layer and not at the data layer. The function was too big
to remain inlined and it's used at a few places where size counts.
2015-03-13 00:42:48 +01:00
Willy Tarreau
f31fb07958 MEDIUM: connection: make conn_drain() perform more controls
Currently si_idle_conn_null_cb() has to perform some low-level checks
over the file descriptor and the connection configuration that should
only belong to conn_drain(). Let's move these controls there. The
function now automatically checks for errors and hangups on the file
descriptor for example, and disables recv polling if there's no drain
function at the control layer.
2015-03-13 00:32:20 +01:00
Willy Tarreau
0a03c0f022 MEDIUM: stream-int: make conn_si_send_proxy() use conn_sock_send()
This substantially simplifies the code as we don't need to handle the
file descriptors anymore nor the specific error codes from send().
2015-03-13 00:09:30 +01:00
Willy Tarreau
1398aa19d8 MEDIUM: stream-int: replace xprt->shutw calls with conn_data_shutw()
Now that the connection performs the correct controls when shutting down,
use that in the few places where conn->xprt->shutw() was called. The calls
were split between conn_data_shutw() and conn_data_shutw_hard() depending
on the argument. Since the connection flags are updated, we don't need to
call conn_data_stop_send() anymore, instead we just have to call
conn_cond_update_polling().
2015-03-12 23:04:07 +01:00
Willy Tarreau
4dfd54f26a MINOR: stream-int: use conn_sock_shutw() to shutdown a connection
Stop calling shutdown() on the connection's fd. Note, this also seems
to fix a bug which was harmless, but which consisted in not marking
the connection as shutdown at the socket level until the other side
was shut as well.
2015-03-12 22:44:53 +01:00
Willy Tarreau
1140512f76 CLEANUP: stream-int: remove a redundant clearing of the linger_risk flag
In stream_sock_read0(), we used to clear this flag. But the only case
where stream_sock_read0() is called is in reaction with a conn_sock_read0()
event coming from the lower layers, which already clears this flag. So let's
remove this duplicate one and clear one of the few remaining layering
violations in this area.
2015-03-12 22:32:27 +01:00
Willy Tarreau
78955f4c8b MEDIUM: session: simplify receive buffer allocator to only use the channel
Now that we can get the session from the channel, let's simplify the
prototype of session_alloc_recv_buffer() to only require the channel.
Both the caller and the function are now simplified.
2015-03-11 20:41:47 +01:00
Willy Tarreau
afc8a22ad7 CLEANUP: stream-int: limit usage of si_ic/si_oc
As much as possible, we copy the result of this function into a local
variable to avoid having to check the flag all the time.
2015-03-11 20:41:47 +01:00
Willy Tarreau
50fe03be78 CLEANUP: stream-int: add si_opposite() to find the other stream interface
At a few places we need to find one stream interface from the other one.
Instead of passing via the channel, we simply use the session as an
intermediary, which simply results in applying an offset to the pointer.
2015-03-11 20:41:47 +01:00
Willy Tarreau
4e4292b9af CLEANUP: stream-int: add si_ib/si_ob to dereference the buffers
This makes the code cleaner and is more intuitive to use.
2015-03-11 20:41:46 +01:00
Willy Tarreau
07373b8660 MEDIUM: stream-int: use si_task() to retrieve the task from the stream int
We go back to the session to get the owner. Here again it's very easy
and is just a matter of relative offsets. Since the owner always exists
and always points to the session's task, we can remove some unneeded
tests.
2015-03-11 20:41:46 +01:00
Willy Tarreau
2bb4a96f8f REORG/MEDIUM: stream-int: introduce si_ic/si_oc to access channels
We'll soon remove direct references to the channels from the stream
interface since everything belongs to the same session, so let's
first not dereference si->ib / si->ob anymore and use macros instead.
2015-03-11 20:41:46 +01:00
Willy Tarreau
319f745ba0 MINOR: channel: rename bi_erase() to channel_truncate()
It applies to the channel and it doesn't erase outgoing data, only
pending unread data, which is strictly equivalent to what recv()
does with MSG_TRUNC, so that new name is more accurate and intuitive.
2015-01-14 20:32:59 +01:00
Willy Tarreau
b5051f8742 MINOR: channel: rename bi_avail() to channel_recv_max()
This name more accurately reminds that it applies to a channel and not
to a buffer, and that what is returned may be used as a max number of
bytes to pass to recv().
2015-01-14 20:26:54 +01:00
Willy Tarreau
3889fffe92 MINOR: channel: rename channel_full() to !channel_may_recv()
This function's name was poorly chosen and is confusing to the point of
being suspiciously used at some places. The operations it does always
consider the ability to forward pending input data before receiving new
data. This is not obvious at all, especially at some places where it was
used when consuming outgoing data to know if the buffer has any chance
to ever get the missing data. The code needs to be re-audited with that
in mind. Care must be taken with existing code since the polarity of the
function was switched with the renaming.
2015-01-14 18:41:33 +01:00
Willy Tarreau
56efc4896b OPTIM: stream-int: try to send pending spliced data
This is the equivalent of eb9fd51 ("OPTIM: stream_sock: reduce the amount
of in-flight spliced data") whose purpose is to try to immediately send
spliced data if available.
2014-12-24 23:47:33 +01:00
Willy Tarreau
9b20c55562 MEDIUM: stream-int: support splicing from applets
If we want to splice from applets, we must check the pipe before clearing
SI_FL_WAIT_ROOM.
2014-12-24 23:47:33 +01:00
Willy Tarreau
10fc09e872 MAJOR: session: only allocate buffers when needed
A session doesn't need buffers all the time, especially when they're
empty. With this patch, we don't allocate buffers anymore when the
session is initialized, we only allocate them in two cases :

  - during process_session()
  - during I/O operations

During process_session(), we try hard to allocate both buffers at once
so that we know for sure that a started operation can complete. Indeed,
a previous version of this patch used to allocate one buffer at a time,
but it can result in a deadlock when all buffers are allocated for
requests for example, and there's no buffer left to emit error responses.
Here, if any of the buffers cannot be allocated, the whole operation is
cancelled and the session is added at the tail of the buffer wait queue.

At the end of process_session(), a call to session_release_buffers() is
done so that we can offer unused buffers to other sessions waiting for
them.

For I/O operations, we only need to allocate a buffer on the Rx path.
For this, we only allocate a single buffer but ensure that at least two
are available to avoid the deadlock situation. In case buffers are not
available, SI_FL_WAIT_ROOM is set on the stream interface and the session
is queued. Unused buffers resulting either from a successful send() or
from an unused read buffer are offered to pending sessions during the
->wake() callback.
2014-12-24 23:47:33 +01:00
Willy Tarreau
bf883e0aa7 MAJOR: session: implement a wait-queue for sessions who need a buffer
When a session_alloc_buffers() fails to allocate one or two buffers,
it subscribes the session to buffer_wq, and waits for another session
to release buffers. It's then removed from the queue and woken up with
TASK_WAKE_RES, and can attempt its allocation again.

We decide to try to wake as many waiters as we release buffers so
that if we release 2 and two waiters need only once, they both have
their chance. We must never come to the situation where we don't wake
enough tasks up.

It's common to release buffers after the completion of an I/O callback,
which can happen even if the I/O could not be performed due to half a
failure on memory allocation. In this situation, we don't want to move
out of the wait queue the session that was just added, otherwise it
will never get any buffer. Thus, we only force ourselves out of the
queue when freeing the session.

Note: at the moment, since session_alloc_buffers() is not used, no task
is subscribed to the wait queue.
2014-12-24 23:47:33 +01:00
Willy Tarreau
a69fc9f803 BUG/MAJOR: stream-int: properly check the memory allocation return
In stream_int_register_handler(), we call si_alloc_appctx(si) but as a
mistake, instead of checking the return value for a NULL, we test <si>.
This bug was discovered under extreme memory contention (memory for only
two buffers with 500 connections waiting) and after 3 million failed
connections. While it was very hard to produce it, the fix is tagged
major because in theory it could happen when haproxy runs with a very
low "-m" setting preventing from allocating just the few bytes needed
for an appctx. But most users will never be able to trigger it. The
fix was confirmed to address the bug.

This fix must be backported to 1.5.
2014-12-23 11:22:39 +01:00
Willy Tarreau
9dc1c61c43 BUG/CRITICAL: http: don't update msg->sov once data start to leave the buffer
Commit bb2e669 ("BUG/MAJOR: http: correctly rewind the request body
after start of forwarding") was incorrect/incomplete. It used to rely on
CF_READ_ATTACHED to stop updating msg->sov once data start to leave the
buffer, but this is unreliable because since commit a6eebb3 ("[BUG]
session: clear BF_READ_ATTACHED before next I/O") merged in 1.5-dev1,
this flag is only ephemeral and is cleared once all analysers have
seen it. So we can start updating msg->sov again each time we pass
through this place with new data. With a sufficiently large amount of
data, it is possible to make msg->sov wrap and validate the if()
condition at the top, causing the buffer to advance by about 2GB and
crash the process.

Note that the offset cannot be controlled by the attacker because it is
a sum of millions of small random sizes depending on how many bytes were
read by the server and how many were left in the buffer, only because
of the speed difference between reading and writing. Also, nothing is
written, the invalid pointer resulting from this operation is only read.

Many thanks to James Dempsey for reporting this bug and to Chris Forbes for
narrowing down the faulty area enough to make its root cause analysable.

This fix must be backported to haproxy 1.5.
2014-09-02 16:48:54 +02:00
David S
afb768340c MEDIUM: connection: Implement and extented PROXY Protocol V2
This commit modifies the PROXY protocol V2 specification to support headers
longer than 255 bytes allowing for optional extensions.  It implements the
PROXY protocol V2 which is a binary representation of V1. This will make
parsing more efficient for clients who will know in advance exactly how
many bytes to read.  Also, it defines and implements some optional PROXY
protocol V2 extensions to send information about downstream SSL/TLS
connections.  Support for PROXY protocol V1 remains unchanged.
2014-05-09 08:25:38 +02:00
Willy Tarreau
7e3127391f MINOR: config: make the stream interface idle timer user-configurable
The new tune.idletimer value allows one to set a different value for
idle stream detection. The default value remains set to one second.
It is possible to disable it using zero, and to change the default
value at build time using DEFAULT_IDLE_TIMER.
2014-02-12 16:36:12 +01:00
Willy Tarreau
c5890e66cd MEDIUM: stream-int: automatically disable CF_STREAMER flags after idle
Disabling the streamer flags after an idle period will help TCP proxies
to better adapt to the streams they're forwarding, especially with SSL
where this will allow the SSL sender to use smaller records. This is
typically used to optimally relay HTTP and derivatives such as SPDY or
HTTP/2 in pure TCP mode when haproxy is used as an SSL offloader.

This idea was first proposed by Ilya Grigorik on the haproxy mailing
list, and his tests seem to confirm the improvement :

  https://www.mail-archive.com/haproxy@formilux.org/msg12576.html
2014-02-12 11:46:03 +01:00
Willy Tarreau
7bed945be0 OPTIM: ssl: implement dynamic record size adjustment
By having the stream interface pass the CF_STREAMER flag to the
snd_buf() primitive, we're able to tell the send layer whether
we're sending large chunks or small ones.

We use this information in SSL to adjust the max record dynamically.
This results in small chunks respecting tune.ssl.maxrecord at the
beginning of a transfer or for small transfers, with an automatic
switch to full records if the exchanges last long. This allows the
receiver to parse HTML contents on the fly without having to retrieve
16kB of data, which is even more important with small initcwnd since
the receiver does not need to wait for round trips to start fetching
new objects. However, sending large files still produces large chunks.

For example, with tune.ssl.maxrecord = 2859, we see 5 write(2885)
sent in two segments each and 6 write(16421).

This idea was first proposed on the haproxy mailing list by Ilya Grigorik.
2014-02-06 11:37:29 +01:00
Willy Tarreau
1049b1f551 MEDIUM: connection: don't use real send() flags in snd_buf()
This prevents us from passing other useful info and requires the
upper levels to know these flags. Let's use a new flags category
instead : CO_SFL_*. For now, only MSG_MORE has been remapped.
2014-02-06 11:37:29 +01:00
Willy Tarreau
798c3c9c41 MINOR: stream-interface: no need to call fd_stop_both() on error
We don't need to call fd_stop_both() since we already call
conn_cond_update_polling() which will do it. This call was introduced by
commit d29a066 ("BUG/MAJOR: connection: always recompute polling status
upon I/O").
2014-01-26 00:42:31 +01:00
Willy Tarreau
708e717251 MEDIUM: stream-interface: the polling flags must always be updated in chk_snd_conn
We used to only update the polling flags in data phase, but after that
we could update other flags. It does not seem possible to trigger a
bug here but it's not very safe either. Better always keep them up to
date.
2014-01-26 00:42:30 +01:00
Willy Tarreau
fd803bb4d7 MEDIUM: connection: add check for readiness in I/O handlers
The recv/send callbacks must check for readiness themselves instead of
having their callers do it. This will strengthen the test and will also
ensure we never refrain from calling a handshake handler because a
direction is being polled while the other one is ready.
2014-01-26 00:42:30 +01:00
Willy Tarreau
e1f50c4b02 MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
We simply remove these functions and replace their calls with the
appropriate ones :

  - if we're in the data phase, we can simply report wait on the FD
  - if we're in the socket phase, we may also have to signal the
    desire to read/write on the socket because it might not be
    active yet.
2014-01-26 00:42:30 +01:00
Willy Tarreau
310987a038 MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.

For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.

Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.
2014-01-26 00:42:30 +01:00
Willy Tarreau
e6300be8f8 BUG/MEDIUM: stream-interface: don't wake the task up before end of transfer
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") was not correct. It used to wake up the task
as soon as there was some write activity and the flag was set, even if there
were still some data to be forwarded. This resulted in process_session()
being called a lot when transfering chunk-encoded HTTP responses made of
very large chunks.

The purpose of the flag is to wake up only a task waiting for some
room and not the other ones, so it's totally counter-productive to
wake it up as long as there are data to forward because the task
will not be allowed to write anyway.

Also, the commit above was taking some risks by not considering
certain events anymore (eg: state != SI_ST_EST). While such events
are not used at the moment, if some new features were developped
in the future relying on these, it would be better that they could
be notified when subscribing to the WAKE_WRITE event, so let's
restore the condition.
2014-01-25 22:28:22 +01:00
Willy Tarreau
46be2e5039 MEDIUM: connection: update callers of ctrl->drain() to use conn_drain()
Now we can more safely rely on the connection state to decide how to
drain and what to do when data are drained. Callers don't need to
manipulate the file descriptor's state anymore.

Note that it also removes the need for the fix ea90063 ("BUG/MEDIUM:
stream-int: fix the keep-alive idle connection handler") since conn_drain()
correctly sets the polling flags.
2014-01-20 22:27:17 +01:00
Willy Tarreau
7f4bcc312d MINOR: protocol: improve the proto->drain() API
It was not possible to know if the drain() function had hit an
EAGAIN, so now we change the API of this function to return :
  < 0 if EAGAIN was met
  = 0 if some data remain
  > 0 if a shutdown was received
2014-01-20 22:27:16 +01:00
Willy Tarreau
d7ad9f5b0d MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.

The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.

That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.

In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.

The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).

That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
2013-12-31 18:37:36 +01:00
Willy Tarreau
61f7f0a959 BUG/MINOR: stream-int: do not clear the owner upon unregister
Since the applet rework and the removal of the inter-task applets,
we must not clear the stream-interface's owner task anymore otherwise
we risk a crash when maintaining keep-alive with an applet. This is
not possible right now so there is no impact yet, but this bug is not
easy to track down. No backport is needed.
2013-12-28 21:33:37 +01:00
Willy Tarreau
ea90063cbc BUG/MEDIUM: stream-int: fix the keep-alive idle connection handler
Commit 2737562 (MEDIUM: stream-int: implement a very simplistic idle
connection manager) implemented an idle connection handler. In the
case where all data is drained from the server, it fails to disable
polling, resulting in a busy spinning loop.

Thanks to Sander Klein and Guillaume Castagnino for reporting this bug.

No backport is needed.
2013-12-17 14:21:48 +01:00
Willy Tarreau
2737562e43 MEDIUM: stream-int: implement a very simplistic idle connection manager
Idle connections are not monitored right now. So if a server closes after
a response without advertising it, it won't be detected until a next
request wants to use the connection. This is a bit problematic because
it unnecessarily maintains file descriptors and sockets in an idle
state.

This patch implements a very simple idle connection manager for the stream
interface. It presents itself as an I/O callback. The HTTP engine enables
it when it recycles a connection. If a close or an error is detected on the
underlying socket, it tries to drain as much data as possible from the socket,
detect the close and responds with a close as well, then detaches from the
stream interface.
2013-12-17 00:00:28 +01:00