We simply remove these functions and replace their calls with the
appropriate ones :
- if we're in the data phase, we can simply report wait on the FD
- if we're in the socket phase, we may also have to signal the
desire to read/write on the socket because it might not be
active yet.
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.
For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.
Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.
This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.
Several new state manipulation functions have been introduced or
adapted :
- fd_want_{recv,send} : enable receiving/sending on the FD regardless
of its state (sets the ACTIVE flag) ;
- fd_stop_{recv,send} : stop receiving/sending on the FD regardless
of its state (clears the ACTIVE flag) ;
- fd_cant_{recv,send} : report a failure to receive/send on the FD
corresponding to EAGAIN (clears the READY flag) ;
- fd_may_{recv,send} : report the ability to receive/send on the FD
as reported by poll() (sets the READY flag) ;
Some functions are used to report the current FD status :
- fd_{recv,send}_active
- fd_{recv,send}_ready
- fd_{recv,send}_polled
Some functions were removed :
- fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()
The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.
In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.
The following pollers have been updated :
ev_select() : done, built, tested on Linux 3.10
ev_poll() : done, built, tested on Linux 3.10
ev_epoll() : done, built, tested on Linux 3.10 & 3.13
ev_kqueue() : done, built, tested on OpenBSD 5.2
We're completely changing the way FDs will be polled. There will be no
more speculative I/O since we'll know the exact FD state, so these will
only be cached events.
First, let's fix a few field names which become confusing. "spec_e" was
used to store a speculative I/O event state. Now we'll store the whole
R/W states for the FD there. "spec_p" was used to store a speculative
I/O cache position. Now let's clearly call it "cache".
We're completely changing the way FDs will be polled. First, let's fix
a few field names which become confusing. "spec_e" was used to store a
speculative I/O event state. Now we'll store the whole R/W states for
the FD there.
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") was not correct. It used to wake up the task
as soon as there was some write activity and the flag was set, even if there
were still some data to be forwarded. This resulted in process_session()
being called a lot when transfering chunk-encoded HTTP responses made of
very large chunks.
The purpose of the flag is to wake up only a task waiting for some
room and not the other ones, so it's totally counter-productive to
wake it up as long as there are data to forward because the task
will not be allowed to write anyway.
Also, the commit above was taking some risks by not considering
certain events anymore (eg: state != SI_ST_EST). While such events
are not used at the moment, if some new features were developped
in the future relying on these, it would be better that they could
be notified when subscribing to the WAKE_WRITE event, so let's
restore the condition.
Recent commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes") introduced this new CF_WAKE_WRITE flag that
an analyser which requires some free space to write must set if it wants
to be notified.
Unfortunately, some places were missing. More specifically, the
compression engine can rarely be stuck by a lack of output space,
especially when dealing with non-compressible data. It then has to
stop until some pending data are flushed and for this it must set
the CF_WAKE_WRITE flag. But these cases were missed by the commit
above.
Fortunately, this change was introduced very recently and never
released, so the impact was limited.
Huge thanks to Sander Klein who first reported this issue and who kindly
and patiently provided lots of traces and test data that made it possible
to reproduce, analyze, then fix this issue.
Patrick Hemmer reported that using unique_id_format and logs did not
report the same unique ID counter since commit 9f09521 ("BUG/MEDIUM:
unique_id: HTTP request counter must be unique!"). This is because
the increment was done while producing the log message, so it was
performed twice.
A better solution consists in fetching a new value once per request
and saving it in the request or session context for all of this
request's life.
It happens that sessions already have a unique ID field which is used
for debugging and reporting errors, and which differs from the one
sent in logs and unique_id header.
So let's change this to reuse this field to have coherent IDs everywhere.
As of now, a session gets a new unique ID once it is instanciated. This
means that TCP sessions will also benefit from a unique ID that can be
logged. And this ID is renewed for each extra HTTP request received on
an existing session. Thus, all TCP sessions and HTTP requests will have
distinct IDs that will be stable along all their life, and coherent
between all places where they're used (logs, unique_id header,
"show sess", "show errors").
This feature is 1.5-specific, no backport to 1.4 is needed.
The fetch "req.ssl_ver" is not declared as explicit acl. If it is used
as implicit ACL, the acl engine detect SMP_T_UINT output type and choose
to use the default interger parser: pat_parse_int(). This fetch needs the
parser pat_parse_dotted_ver().
This patch declare explicit ACL named "req.ssl_ver" that use the good
parser function pat_parse_dotted_ver().
Checks used not to precisely report the errors that were detected at the
connection layer (eg: too many SSL connections). Using chk_report_conn_err()
makes this possible.
Now when a connection error happens, it is reported in the connection
so that upper layers know exactly what happened. This is particularly
useful with health checks and resources exhaustion.
It is quite often that an connection error only reports "socket error" with
no more information. This is especially problematic with health checks where
many causes are possible, including resource exhaustion which do not lead to
a valid errno code. So let's add explicit codes to cover these cases.
Actually the values returned by this function is never used. All the
callers just check if the resultat is non-zero. Before this patch, the
function returns the length of the produced content. This value is not
useful because is returned twice: the first time in the return value and
the second time in the <binstrlen> argument. Now the function returns
the number of bytes consumed from <source>.
The functions pat_parse_* must return 0 if fail and the number of
elements eated from **text if not fail. The function pat_parse_bin()
returns 0 or the length parsed. This causes a segfault. I just apply the
double operator "!" on the result of the function pat_parse_bin() and
the return value value match the expected value.
Now we can more safely rely on the connection state to decide how to
drain and what to do when data are drained. Callers don't need to
manipulate the file descriptor's state anymore.
Note that it also removes the need for the fix ea90063 ("BUG/MEDIUM:
stream-int: fix the keep-alive idle connection handler") since conn_drain()
correctly sets the polling flags.
When an incoming shutdown or error is detected, we know that we
can safely close without disabling lingering. Do it in tcp_drain()
so that we don't have to do it from each and every caller.
Till now there was no way to know from a connection if a previous
call to drain() had done any change. This function is used to drain
incoming data and to update the connection's flags at the same time.
It also correctly sets the polling flags on the connection if the
drain function indicates inability to receive. This function will
be used preferably over ctrl->drain() when a connection is used.
It was not possible to know if the drain() function had hit an
EAGAIN, so now we change the API of this function to return :
< 0 if EAGAIN was met
= 0 if some data remain
> 0 if a shutdown was received
The accept loop used to force fd_poll_recv() even in places where it
was not completely appropriate (eg: unexpected errors). It does not
yet cause trouble but will do with the upcoming polling changes. Let's
use it only where relevant now. EINTR/ECONNABORTED do not result in
poll() anymore but the failed connection is simply skipped (this code
dates from 1.1.32 when error codes were first considered).
Some rare unexplained busy loops were observed on versions up to 1.5-dev19.
It happens that if a file descriptor happens to be disabled for both read and
write while it was speculatively enabled for both and this without creating a
new update entry, there will be no way to remove it from the speculative I/O
list until some other changes occur. It is suspected that a double sequence
such as enable_both/disable_both could have led to this situation where an
update cancels itself and does not clear the spec list in the poll loop.
While it is unclear what I/O sequence may cause this situation to arise, it
is safer to always add the FD to the update list if nothing could be done on
it so that the next poll round will automatically take care of it.
This is 1.5-specific, no backport is needed.
Recent commit abf08d9 ("BUG/MAJOR: connection: fix mismatch between rcv_buf's
API and usage") accidentely broke SSL by relying on an uninitialized value to
enter the read loop.
Many thanks to Cyril Bont and Steve Ruiz for reporting this issue.
The value of the variable "appctx->ctx.map.ent" is used after the loop,
but its value has changed. The variable "value" is initialized and
contains the good value.
This is a recent bug, no backport is needed.
Recent commit 4448925 ("BUILD/MINOR: listener: remove a glibc warning on accept4()")
broke accept4() on some systems because the glibc's version may now conflict with
the local one.
Steve Ruiz reported some reproducible crashes with HTTP health checks
on a certain page returning a huge length. The traces he provided
clearly showed that the recv() call was performed twice for a total
size exceeding the buffer's length.
Cyril Bont tracked down the problem to be caused by the full buffer
size being passed to rcv_buf() in event_srv_chk_r() instead of passing
just the remaining amount of space. Indeed, this change happened during
the connection rework in 1.5-dev13 with the following commit :
f150317 MAJOR: checks: completely use the connection transport layer
But one of the problems is also that the comments at the top of the
rcv_buf() functions suggest that the caller only has to ensure the
requested size doesn't overflow the buffer's size.
Also, these functions already have to care about the buffer's size to
handle wrapping free space when there are pending data in the buffer.
So let's change the API instead to more closely match what could be
expected from these functions :
- the caller asks for the maximum amount of bytes it wants to read ;
This means that only the caller is responsible for enforcing the
reserve if it wants to (eg: checks don't).
- the rcv_buf() functions fix their computations to always consider
this size as a max, and always perform validity checks based on
the buffer's free space.
As a result, the code is simplified and reduced, and made more robust
for callers which now just have to care about whether they want the
buffer to be filled or not.
Since the bug was introduced in 1.5-dev13, no backport to stable versions
is needed.
The accept4() Linux syscall requires _GNU_SOURCE on ix86, otherwise
it emits a warning. On other archs including x86_64, this problem
doesn't happen. Thanks to Charles Carter from Sigma Software for
reporting this.
If the pattern is set as case insensitive, the string comparison
is executed twice. The first time is insensitive comparison, the
second is sensitive.
This is a recent bug, no backport is needed.
This reverts commit 1208266356.
It randomly breaks SSL. What happens is that if the SSL response is
read at once by the SSL stack and is partially delivered to the buffer,
then there's no way to read the next parts because we wait for some
polling first.
So we'll fix this after the polling rework.
A config where multiple servers have the same name in the same backend is
prone to a number of issues : logs are not really exploitable, stats get
really tricky and even harder to change, etc...
In fact, it can be safe to have the same name between multiple servers only
when their respective IDs are known and used. So now we detect this situation
and emit a warning for the first conflict detected per server if any of the
servers uses an automatic ID.
The code which enables tunnel mode or TCP transfers is rarely used
and at most once per session. Putting it in an unlikely() clause
reduces the length of the hot path of process_session() which is
already quite long, and also slightly reduces its overall size.
Some measurements show a steady gain of about 0.2% thanks to this.
This function is called twice per request, and does almost always nothing.
Better use an inline version to avoid entering it when we can.
About 0.5% additional performance was gained this way.
si_connect() used to only return SI_ST_CON. But it already detect the
connection reuse and is the function which avoids calling connect().
So it already knows the connection is valid and reuse. Thus we make it
return SI_ST_EST when a connection is reused. This means that
connect_server() can return this state and sess_update_stream_int()
as well.
Thanks to this change, we don't need to leave process_session() in
SI_ST_CON state to immediately enter it again to switch to SI_ST_EST.
Implementing this removes one call to process_session() per request
in keep-alive mode. We're now at 2 calls per request, which is the
minimum (one for the request and another one for the response). The
number of calls to http_wait_for_response() has also dropped from 2
to one.
Tests indicate a performance gain of about 2.6% in request rate in
keep-alive mode. There should be no gain in http-server-close() since
we don't use this faster path.
At the moment it is possible in sess_prepare_conn_req() to switch to the
established state when the target is an applet. But sess_update_stream_int()
will soon also have the ability to set the established state via
connect_server() when a connection is reused, leading to a synchronous
connect.
So prepare the code to handle this SI_ST_ASS -> SI_ST_EST transition, which
really matches what's done in the lower layers.
Currently there are 3 places in the code where t_connect is set after
switching to state SI_ST_EST, and a fourth one will soon come. Since
all these places lead to an immediate call to sess_establish() to
complete the session establishment, better move that measurement
there.
It's a bit hasardous to wipe out all channel flags, this flag should
be left intact as it protects against recursive calls. Fortunately,
we have no possibility to meet this situation with current applets,
but better fix it before it becomes an issue.
This bug has been there for a long time, but it doesn't seem worth
backporting the fix.
As soon as we connect to the server, we want to limit the number of
recvfrom() on the response path because most of the time a single
call will retrieve enough information.
At the moment this is only done in the HTTP response parser, after
some reads have already failed, which is too late. We need to do
that at the earliest possible instant. It was already done for the
request side by frontend_accept() for the first request, and by
http_reset_txn() for the next requests.
Thanks to this change, there are no more failed recvfrom() calls in
keep-alive mode.
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.
The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.
That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.
In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.
The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).
That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
This reverts commit f3221f99ac.
Igor reported some very strange breakage of his stats page which is
clearly caused by the chunking, though I don't see at first glance
what could be wrong. Better revert it for now.
In theory the principle is simple as we just need to send HTTP chunks
if the client is 1.1 compatible. In practice it's harder because we
have to append a CR LF after each block of data and we're never sure
to have the room for this. In order not to have to deal with this, we
instead send the CR LF prior to each chunk size. The only issue is for
the first chunk and for this reason we avoid to send the empty header
line when using chunked encoding.
Since the applet rework and the removal of the inter-task applets,
we must not clear the stream-interface's owner task anymore otherwise
we risk a crash when maintaining keep-alive with an applet. This is
not possible right now so there is no impact yet, but this bug is not
easy to track down. No backport is needed.
This value is stored as unsigned in chn->to_forward. Having it defined
as signed makes it impossible to pass channel_forward() a previously
saved value because the argument will be zero-extended during the
conversion to long long, while the test will be performed using sign
extension. There is no impact on existing code right now.
When enabling a tracked server via the web interface, we must first
check if the server tracks another one and the state of this tracked
server, just like the command line does.
Failure to do so causes incorrect logs to be emitted when the server
is enabled :
[WARNING] 361/212556 (2645) : Server bck2/srv3 is DOWN via bck2/srv2. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 361/212603 (2645) : Server bck2/srv3 is DOWN for maintenance.
--> enable server now
[WARNING] 361/212606 (2645) : Server bck2/srv3 is UP (leaving maintenance).
With this fix, it's correct now :
[WARNING] 361/212805 (2666) : Server bck2/srv3 is DOWN via bck2/srv2. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 361/212813 (2666) : Server bck2/srv3 is DOWN for maintenance.
--> enable server now
[WARNING] 361/212821 (2666) : Server bck2/srv3 is DOWN via bck2/srv2. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
It does not seem necessary to backport this fix, considering that it
depends on extremely fragile behaviours, there are more risks of breakage
caused by a backport than the current inconvenience.
Recent fix 02541e8 (BUG/MEDIUM: checks: servers must not start in
slowstart mode) failed to consider one case : a server chich is not
checked at all can be disabled and has to support being enabled
again. So we must also enter the set_server_up() function when the
checks are totally disabled.
No backport is needed.