Sometimes when debugging haproxy, it is important to take a full
snapshot of all sessions and their respective states. Till now it
was complicated to do because we had to use scripts and sessions
would vanish between two runs.
Now with this command we have the same output as "show sess $id"
but for all sessions in the table. This is a debugging command only,
it should only be used by developers as it is never guaranteed to
perfectly work !
Commit 9b6700f added "v6only". As suggested by Vincent Bernat, it is
sometimes useful to have the opposite option to force binding to the
two protocols when the system is configured to bind to v6 only by
default. This option does exactly this. v6only still has precedence.
Depending on the content-types and accept-encoding fields, some responses
might or might not be compressed. Let's have a counter of the number of
compressed responses and report it in the stats to help improve compression
usage.
Some cosmetic issues were fixed in the CSV output too (missing commas at the
end).
It's interesting to know the average compression ratio obtained on
frontends and backends without having to compute it by hand, so let's
report it in the HTML stats.
The conn_local_send_proxy() function has to retrieve the local and remote
addresses, but the getpeername() and getsockname() functions may fail until
the connection is established. So now we catch this error and poll for write
when this happens.
If an uncaught CO_FL_ERROR flag on a connection is detected, we
immediately go to the wakeup function. This ensures that even if
an error is asynchronously delivered, we don't risk re-enabling
polling or doing unexpected things in the handshake handlers.
Commit 0ffde2cc in 1.5-dev13 tried to always disable polling on file
descriptors when errors were encountered. Unfortunately it did not
always succeed in doing so because it relied on detecting polling
changes to disable it. Let's use a dedicated conn_stop_polling()
function that is inconditionally called upon error instead.
This managed to stop a busy loop observed when a health check makes
use of the send-proxy protocol and fails before the connection can
be established.
Commit 24db47e0 tried to improve support for delayed ACK upon connect
but it was incomplete, because checks with the proxy protocol would
always enable polling for data receive and there was no way of
distinguishing data polling and delayed ack.
So we add a distinct delack flag to the connect() function so that
the caller decides whether or not to use a delayed ack regardless
of pending data (eg: when send-proxy is in use). Doing so covers all
combinations of { (check with data), (sendproxy), (smart-connect) }.
When leaving, during the deinit() process, prune_acl_expr() is called to
delete all ACL expressions. A bug was introduced with commit 34db1084 that
caused every other expression argument to be skipped, and more annoyingly,
it introduced the risk of scanning past the arg list and crashing or
freezing the old process during a reload.
Credits for finding this issue go to Dmitry Sivachenko who first reported
it, and second did a lot of research to narrow it down to a minimal
configuration.
Since 1.5-dev9, ACLs support multiple args. The changes performed in
acl_find_targets() were bogus as they were not always applied to the
current argument being processed, but sometimes to the first one only.
Fortunately till now, all ACLs which support resolvable arguments have
it in the first place only, so there was no impact.
If some listeners are mistakenly configured with 0 as the maxaccept value,
then we now consider them as limited to one accept() at a time. This will
avoid some issues as fixed by the past commit.
Recent commit 16a214 to move the maxaccept parameter to listeners didn't
set it on the peers' listeners, resulting in the value zero being used
there. This caused a busy loop for each peers section, because no incoming
connection could be accepted.
Thanks to Herv Commowick for reporting this issue.
Several places got the connection close sequence wrong because it
was not obvious. In practice we always need the same sequence when
aborting, so let's have a common function for this.
Commit a522f801 moved a call to __conn_data_want_recv() just after the
connect() call, which is not 100% correct. First, it does not take errors
into account, eventhough this is harmless. Second, this change will only
be taken into account after next call do conn_data_polling_update(), which
is not necessarily what is expected (eg: if an error is only reported on
the recv side).
So let's use conn_data_poll_recv() instead, which directly subscribes
the event to polling.
Since last commit, some timeouts were converted into an error to report
the status, and as a result, the socket was not closed because it was
supposed to have been done during the wake() call.
Close the socket as soon as the timeout is detected to fix the issue.
Also we now ensure to first initialize the connection flags.
Until now, the check socked was closed in the task which handles the
check, which can sometimes be substantially later when many tasks are
running. It's much cleaner to close() in the wake call, which also
helps removing some FD management from the task itself.
The code is faster and smaller, and fast health checks show a more
predictable behaviour.
Pure TCP checks only use the SYN/ACK in return to a SYN. By forcing
the system to use delayed ACKs, it is possible to send an RST instead
of the ACK and thus ensure that the application will never be needlessly
woken up. This avoids error logs or counters on checked components since
the application is never made aware of this connection which dies in the
network stack.
The process_chk() function still did not consider the the timeout when
it was woken up, so a spurious wakeup could trigger a false timeout. Some
checks were now redundant or could not be triggered (eg: L7 timeout).
So remove them and rearrange the timeout detection.
The porting of checks to using connections was totally bogus. Some checks
were considered successful as soon as the connection was established,
regardless of any response. Some errors would be triggered upon recv
if polling was enabled for send or if the send channel was shut down.
Now the behaviour is much better. It would be cleaner to perform the
fd_delete() in wake_srv_chk() and to process failures and timeouts
separately, but this is already a good start.
Some server check flag names were not properly choosen and cause
analysis trouble, especially the CHK_RUNNING one which does not
mean that a check is running but that the server is running...
Here's the rename :
CHK_RUNNING -> CHK_PASSED
CHK_ERROR -> CHK_FAILED
Some checks which do not induce a close from the server accumulate
local TIME_WAIT sockets because they're cleanly shut down. Typically
TCP probes cause this. This is very problematic when there are many
servers, when the checks are fast or when local source ports are rare.
So now we'll disable lingering on the socket instead of sending a
shutdown. Before doing this we try to drain any possibly pending data.
That way we avoid sending an RST when the server has closed first.
This change means that some servers will see more RSTs, but this is
needed to avoid local source port starvation.
When a check succeeds, it used to only disable receive events while
it should disable both directions. The problem is that if the send
event was reported too, it could re-enable the recv event. In theory
this is not a problem as the task is going to be woken up, but if
there are many tasks in the queue and this task is not processed
immediately, we could theorically face a storm of unprocessed events
(typically POLL_HUP).
So better stop both directions, prevent the send side from enabling
recv and have the process_chk() code enable both directions. This
will also help detecting closes before the check is sent.
Note that all this mess has been inherited from the old code that used
the fd as a flag to report if a check was running. We should have a
dedicated flag and perform the fd_delete() in wake_srv_chk() instead.
Health checks currently still use the connection's fd to know whether
a check is running (this needs to change). When a health check
immediately fails during connect() because of a lack of local resource
(eg: port), we failed to unset the fd, so each time the process_chk
woken up after such an error, it believed a check was still running
and used to close the fd again instead of starting a new check. This
could result in other connections being closed because they were
assigned the same fd value.
The bug is only marked medium because when this happens, the system
is already in a bad state.
A comment was added above tcp_connect_server() to clarify that the
fd is *not* valid on error.
It was a bit frustrating to have no idea about the bandwidth saved by
HTTP compression. Now we have per-frontend and per-backend stats. The
stats on the HTTP interface are shown in a hover title in the "bytes out"
column if at least something was fed to the compressor. 3 new columns
appeared in the CSV stats output.
Some users need more than 64 characters to log large cookies. The limit
was set to 63 characters (and not 64 as previously documented). Now it
is possible to change this using the global "tune.http.cookielen" setting
if required.
The connection handling changed introduced in 1.5-dev12 introduced a
regression with commit 9bf9c14c. The issue is that the stream_sock_read0()
callback must update the channel flags to indicate that the side is closed
so that when process_session() is called, it can propagate the close to the
other side and terminate the session.
The issue only appears in HTTP tunnel mode. It's a bit tricky to trigger
the issue, it requires that the request channel is full with data flowing
from the client to the server and that both the response and the read0()
are received at once so that the flags are not updated, and that the HTTP
analyser switches to tunnel mode without being informed that the request
write side is closed. After that, process_session() does not know that the
connection has to be aborted either, and no more event appears on this side
where the connection stays here forever.
Many thanks to Igor at owind for testing several snapshots and for providing
valuable traces to reproduce and diagnose the issue!
New option 'maxcompcpuusage' in global section.
Sets the maximum CPU usage HAProxy can reach before stopping the
compression for new requests or decreasing the compression level of
current requests. It works like 'maxcomprate' but with the Idle.
Using compression rate limit, the compression level wasn't taking care
of the max compression level during a session because the test was done
on the wrong variable.
Some very specifically scheduled workloads could sometimes get stuck when
data receive was disabled due to buffer full then re-enabled due to a full
send(). A conn_data_want_recv() had to be set again in this specific case.
This bug was introduced with connection rework and polling changes in dev12.
This patch makes changes in the http_response_forward_body state
machine. It checks if the compress algorithm had consumed data before
swapping the temporary and the input buffer. So it prevents null sized
zlib chunks.
Disabling compression based on the content-type was improperly done since the
introduction of the COMP_READY flag, sometimes resulting in truncated responses.
global.tune.maxaccept was used for all listeners. This becomes really not
convenient when some listeners are bound to a single process and other ones
are bound to many processes.
Now we change the principle : we count the number of processes a listener
is bound to, and apply the maxaccept either entirely if there is a single
process, or divided by twice the number of processes in order to maintain
fairness.
The default limit has also been increased from 32 to 64 as it appeared that
on small machines, 32 was too low to achieve high connection rates.