stream_int_chk_rcv_conn() did not clear connection flags before updating them. It
is unsure whether this could have caused the stalled transfers that have been
reported since dev15.
In order to avoid such further issues, we now use a simple inline function to do
all the job.
There were a few synchronous calls to polling updates in some functions
called from the connection handler. These ones are not needed and should
be replaced by more efficient and more debugable asynchronous calls.
Bryan Berry and Baptiste Assmann both reported some occasional CPU
spinning loops where haproxy was still processing I/O but burning
CPU for apparently uncaught events.
What happens is the following sequence :
- proxy is in TCP mode
- a connection from a client initiates a connection to a server
- the connection to the server does not immediately happen and is
polled for
- in the mean time, the client speaks and the stream interface
calls ->chk_snd() on the peer connection to send the new data
- chk_snd() calls send_loop() to send the data. This last one
makes the connection succeed and empties the buffer, so it
disables polling on the connection and on the FD by creating
an update entry.
- before the update is processed, poll() succeeds and reports
a write event for this fd. The poller does fd_ev_set() on the
FD to switch it to speculative mode
- the IO handler is called with a connection which has no write
flag but an FD which is enabled in speculative mode.
- the connection does nothing useful.
- conn_update_polling() at the end of conn_fd_handler() cannot
disable the FD because there were no changes on this FD.
- the handler is left with speculative polling still enabled on
the FD, and will be called over and over until a poll event is
needed to transfer data.
There is no perfectly elegant solution to this. At least we should
update the flags indicating the current polling status to reflect
what is being done at the FD level. This will allow to detect that
the FD needs to be disabled upon exit.
chk_snd() also needs minor changes to correctly switch to speculative
polling before calling send_loop(), and to reflect this in the connection
flags. This is needed so that no event remains stuck there without any
polling. In fact, chk_snd() and chk_rcv() should perform the same number
of preparations and cleanups as conn_fd_handler().
Errors and Hangups are sticky events, which means that once they're
detected, we never clear them, allowing them to be handled later if
needed.
Till now when an error was reported, it used to register a speculative
I/O event for both recv and send. Since the connection had not requested
such events, it was not able to detect a change and did not clear them,
so the events were called in loops until a timeout caused their owner
task to die.
So this patch does two things :
- stop registering spec events when no I/O activity was requested,
so that we don't end up with non-disablable polling state ;
- keep the sticky polling flags (ERR and HUP) when leaving the
connection handler so that an error notification doesn't
magically become a normal recv() or send() report once the
event is converted to a spec event.
It is normally not needed to make the connection handler emit an
error when it detects POLL_ERR because either a registered data
handler will have done it, or the event will be disabled by the
wake() callback.
When the PROXY protocol header is expected and fails, leading to an
abort of the incoming connection, we now emit a log message. If option
dontlognull is set and it was just a port probe, then nothing is logged.
It's annoying that handshake handlers remove themselves from the
connection flags when they fail because there is no way to tell
which one fails. So now we only remove them when they succeed.
The conn_local_send_proxy() function has to retrieve the local and remote
addresses, but the getpeername() and getsockname() functions may fail until
the connection is established. So now we catch this error and poll for write
when this happens.
If an uncaught CO_FL_ERROR flag on a connection is detected, we
immediately go to the wakeup function. This ensures that even if
an error is asynchronously delivered, we don't risk re-enabling
polling or doing unexpected things in the handshake handlers.
Commit 0ffde2cc in 1.5-dev13 tried to always disable polling on file
descriptors when errors were encountered. Unfortunately it did not
always succeed in doing so because it relied on detecting polling
changes to disable it. Let's use a dedicated conn_stop_polling()
function that is inconditionally called upon error instead.
This managed to stop a busy loop observed when a health check makes
use of the send-proxy protocol and fails before the connection can
be established.
The CO_FL_WAIT_* flags were not cleared after updating polling flags.
This means that any caller of these functions that did not clear it
would enable polling instead of speculative I/O. This happens during
the stream interface update call which is performed from the session
handler for example.
As of now it's not a problem yet because speculative I/O and polling
are handled the same way. However with upcoming changes it does cause
some deadlocks because enabling read processing on a file descriptor
where everything was already read will do nothing until something new
happens on this FD.
The correct fix consists in clearing the flags while leaving the update
functions.
This fix does not need any backport as it was introduced with recent
connection changes (dev12) and not triggered until last commit.
This is the first step of a series of changes aiming at making the
polling totally event-driven. This first change consists in only
remembering at the connection level whether an FD was enabled or not,
regardless of the fact it was being polled or cached. From now on, an
EAGAIN will always be considered as a change so that the pollers are
able to manage a cache and to flush it based on such events. One of
the noticeable effect is that conn_fd_handler() is called once more
per session (6 instead of 5 min) but other update functions are less
called.
Note that the performance loss caused by this change at the moment is
quite significant, around 2.5%, but the change is needed to have SSL
working correctly in all situations, even when data were read from the
socket and stored in the invisible cache, waiting for some room in the
channel's buffer.
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.
Using global.tune.bufsize is not the most convenient thing either.
So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.
The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.
Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().
This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
In some circumstances, if the connection to the server is aborted while
some data were planned to be sent and the poller reported an ability to
send, then conn_fd_handler() would still call conn->data->send(), causing
the data layer to dereference the now NULL conn->xprt and crash.
So we have to check for conn->xprt validity before calling the data
layer.
This issue was introduced after 1.5-dev12 so it does not need any backport
and does not affect any released version.
Special thanks go to Cristian Ditoiu who once again provided amazing help
to troubleshoot this bug !
Commit 9e272bf9 broke connection setup in TCP mode, the comment was
misleading and obviously wrong, as after a connection is established,
we *do* have none of the CONNECT* flags. However we can never have
them all at the same time, so let's use this to trigger a detection.
This callback sends a PROXY protocol line on the outgoing connection,
with the local and remote endpoint information. This is used for local
connections (eg: health checks) where the other end needs to have a
valid address and no connection is relayed.
It was previously in frontend.c but there is no reason for this anymore
considering that all the information involved is in the connection itself
only. Theorically this should be in the socket layer but we don't have
this yet.
We absolutely want to disable FD polling after an error is detected,
otherwise the data layer has to do it and it's far from being obvious
at these layers.
The way we did it was a bit tricky in conn_update_*_polling and
conn_*_polling_changes. However it has almost no impact on performance
and code size both for the fast and slow path.
We'll now be able to remove some flag updates in the stream interface.
Just like ->init(), ->wake() may now be used to return an error and
abort the connection. Currently this is not used but will be with
embryonic sessions.
We now check the connection flags for changes in order not to call the
data->wake callback when there is no activity. Activity means a change
on any of the CO_FL_*_SH, CO_FL_ERROR, CO_FL_CONNECTED, CO_FL_WAIT_CONN*
flags, as well as a call to data->recv or data->send.
The generic data-layer init callback is now used after the transport
layer is complete and before calling the data layer recv/send callbacks.
This allows the session to switch from the embryonic session data layer
to the complete stream interface data layer, by making conn_session_complete()
the data layer's init callback.
It sill looks awkwards that the init() callback must be used opon error,
but except by adding yet another one, it does not seem to be mergeable
into another function (eg: it should probably not be merged with ->wake
to avoid unneeded calls during the handshake, though semantically that
would make sense).
Instead of calling conn_notify_si() from the connection handler, we
now call data->wake(), which will allow us to use a different callback
with health checks.
Note that we still rely on a flag in order to decide whether or not
to call this function. The reason is that with embryonic sessions,
the callback is already initialized to si_conn_cb without the flag,
and we can't call the SI notify function in the leave path before
the stream interface is initialized.
This issue should be addressed by involving a different data_cb for
embryonic sessions and for stream interfaces, that would be changed
during session_complete() for the final data_cb.
Now conn->data will designate the data layer which is the client for
the transport layer. In practice it's the stream interface and will
soon also be the health checks.
Better avoid calling the data functions upon error or handshake than
having to put conditions everywhere, which are too easy to forget (one
check for CO_FL_ERROR was missing, but this was harmless).
This data layer supports socket-to-buffer and buffer-to-socket operations.
No sock-to-pipe nor pipe-to-sock functions are provided, since splicing does
not provide any benefit with data transformation. At best it could save a
memcpy() and avoid keeping a buffer allocated but that does not seem very
useful.
An init function and a close function are provided because the SSL context
needs to be allocated/freed.
A data-layer shutw() function is also provided because upon successful
shutdown, we want to store the SSL context in the cache in order to reuse
it for future connections and avoid a new key generation.
The handshake function is directly called from the connection handler.
At this point it is not certain whether this will remain this way or
if a new ->handshake callback will be added to the data layer so that
the connection handler doesn't care about SSL.
The sock-to-buf and buf-to-sock functions are all capable of enabling
the SSL handshake at any time. This also implies polling in the opposite
direction to what was expected. The upper layers must take that into
account (it is OK right now with the stream interface).
It appears that fd.h includes a number of unneeded files and was
included from standard.h, and as such served as an intermediary
to provide almost everything to everyone.
By removing its useless includes, a long dependency chain broke
but could easily be fixed.
If a data handler suddenly switches to a handshake mode and detects the
need for polling in either direction, we don't want to loop again through
the handshake handlers because we know we won't be able to do anything.
Similarly, we don't want to call again the data handlers after a loop
through the handshake handlers if polling is required.
No performance change was observed, it might only be observed during
high rate SSL renegociation.
It was observed that after a failed send() on EAGAIN, a second connect()
would still be attempted in tcp_connect_probe() because there was no way
to know that a send() had failed.
By checking the WANT_WR status flag, we know if a previous write attempt
failed on EAGAIN, so we don't try to connect again if we know this has
already failed.
With this simple change, the second connect() has disappeared.
Polling flags were set for data and sock layer, but while this does make
sense for the ENA flag, it does not for the POL flag which translates the
detection of an EAGAIN condition. So now we remove the {DATA,SOCK}_POL*
flags and instead introduce two new layer-independant flags (WANT_RD and
WANT_WR). These flags are only set when an EAGAIN is encountered so that
polling can be enabled.
In order for these flags to have any meaning they are not persistent and
have to be cleared by the connection handler before calling the I/O and
data callbacks. For this reason, changes detection has been slightly
improved. Instead of comparing the WANT_* flags with CURR_*_POL, we only
check if the ENA status changes, or if the polling appears, since we don't
want to detect the useless poll to ena transition. Tests show that this
has eliminated one useless call to __fd_clr().
Finally the conn_set_polling() function which was becoming complex and
required complex operations from the caller was split in two and replaced
its two only callers (conn_update_data_polling and conn_update_sock_polling).
The two functions are now much smaller due to the less complex conditions.
Note that it would be possible to re-merge them and only pass a mask but
this does not appear much interesting.
The PROXY protocol is now decoded in the connection before other
handshakes. This means that it may be extracted from a TCP stream
before SSL is decoded from this stream.
When an incoming connection request is accepted, a connection
structure is needed to store its state. However we don't want to
fully initialize a session until the data layer is about to be
ready.
As long as the connection is physically stored into the session,
it's not easy to split both allocations.
As such, we only initialize the minimum requirements of a session,
which results in what we call an embryonic session. Then once the
data layer is ready, we can complete the function's initialization.
Doing so avoids buffers allocation and ensures that a session only
sees ready connections.
The frontend's client timeout is used as the handshake timeout. It
is likely that another timeout will be used in the future.
Some parts of the sock_ops structure were only used by the stream
interface and have been moved into si_ops. Some of them were callbacks
to the stream interface from the connection and have been moved into
app_cp as they're the application seen from the connection (later,
health-checks will need to use them). The rest has moved to data_ops.
Normally at this point the connection could live without knowing about
stream interfaces at all.
We need to have a generic function to be called by upper layers when buffer
flags have been updated (the si->update function). At the moment, both sock_raw
and sock_ssl had their own which basically was a copy-paste. Since these
functions are only used to update stream interface flags, it is logical to
have them handled by the stream interface code.
This allowed us to remove the stream_interface-specific update function from
sock_raw and sock_ssl which now use the generic code.
The stream_sock_update_conn callback has also been more appropriately renamed
conn_notify_si() since it's meant to be called by lower layers to notify the
SI and possibly upper layers about incoming changes.
This is a second attempt at getting rid of FD_WAIT_*. Now the situation is
much better since native I/O handlers can directly manipulate the FD using
fd_{poll|want|stop}_* and the connection handlers manipulate connection-level
flags using the conn_{data|sock}_* equivalent.
Proceeding this way ensures that the connection flags always reflect the
reality even after data<->handshake switches.
Now the connection handler, the handshake callbacks and the I/O callbacks
make use of the connection-layer polling functions to enable or disable
polling on a file descriptor.
Some changes still need to be done to avoid using the FD_WAIT_* constants.
The conflicts we're facing with polling is that handshake handlers have
precedence over data handlers and may change the polling requirements
regardless of what is expected by the data layer. This causes issues
such as missed events.
The real need is to have three polling levels :
- the "current" one, which is effective at any moment
- the data one, which reflects what the data layer asks for
- the sock one, which reflects what the socket layer asks for
Depending on whether a handshake is in progress or not, either one of the
last two will replace the current one, and the change will be propagated
to the lower layers.
At the moment, the shutdown status is not considered, and only handshakes
are used to decide which layer to chose. This will probably change.
Handshakes is not called anymore from the data handlers, they're only
called from the connection handler when their flag is set.
Also, this move has uncovered an issue with the stream interface notifier :
it doesn't consider the FD_WAIT_* flags possibly set by the handshake
handlers. This will result in a stuck handshake when no data is in the
output buffer. In order to cover this, for now we'll perform the EV_FD_SET
in the SSL handshake function, but this needs to be addressed separately
from the stream interface operations.
This new flag is used to indicate that the connection was already
connected. It can be used by I/O handlers to know that a connection
has just completed. It is used by stream_sock_update_conn(), allowing
the sock_opt handlers not to manipulate the SI timeout nor the
BF_WRITE_NULL flag anymore.
The sock_ops I/O callbacks made use of an FD till now. This has become
inappropriate and the struct connection is much more useful. It also
fixes the race condition introduced by previous change.
The socket data layer code must only focus on moving data between a
socket and a buffer. We need a special stream interface handler to
update the stream interface and the file descriptor status.
At the moment the code works but suffers from a race condition caused
by its API : the read/write callbacks still make use of the fd instead
of using the connection. And when a double shutdown is performed, a call
to ->write() after ->read() processed an error results in dereferencing
a NULL fdtab[]->owner. This is only a temporary issue which doesn't need
to be fixed now since this will automatically go away when the functions
change to use the connection instead.
Use a single tcp_connect_probe() instead of tcp_connect_write() and
tcp_connect_read(). We call this one only when no data layer function
have been processed, so this is a fallback to test for completion of
a connection attempt.
With this done, we don't have the need for any direct I/O callback
anymore.
The function still relies on ->write() to wake the stream interface up,
so it's not finished.
This handshake handler must be independant, so move it away from
proto_tcp. It has a dedicated connection flag. It is tested before
I/O handlers and automatically removes the CO_FL_WAIT_L4_CONN flag
upon success.
It also sets the BF_WRITE_NULL flag on the stream interface and
stops the SI timeout. However it does not perform the task_wakeup(),
and relies on the data handler to do so for now. The SI wakeup will
have to be moved elsewhere anyway.
It's inappropriate to remove FD_POLL_IN and FD_POLL_OUT in the IO callback
handlers, first because they shouldn't care about this, and second because
it will make it harder to chain multiple callers.
So let's flush these flags only once for all in the connection handler.
Right now, the HUP and ERR flags are still flushed in each IO handler to
avoid multiple calls. This will probably have to be fixed later.