16131 Commits

Author SHA1 Message Date
Willy Tarreau
78bbeb4a99 BUG/MAJOR: stats: correctly check for a possible divide error when showing compression ratios
Commit 5730c68b changed to display compression ratios based on 2xx
responses, but we should then check that there are such responses
instead of checking for requests. The risk is a divide error if there
are some requests but no 2xx yet (eg: redirect).
2012-11-26 16:44:48 +01:00
Willy Tarreau
0a80a8dbb2 MINOR: http: factor out the content-type checks
Let's only look up the content-type header once. This involves
inverting the condition which is not dramatic.

Also, we now always check the value length before comparing it, and we
always reset the ctx.idx before looking a header up. Otherwise that
could make header lookups depend on their on-wire order. It would be
a minor issue however since at worst it would cause some responses not
to be compressed.
2012-11-26 16:36:00 +01:00
Willy Tarreau
5730c68b46 MINOR: stats: compute the ratio of compressed response based on 2xx responses
Since only responses with status 200 can be compressed, let's only count the
ratio of compressed responses on the basis of the 2xx responses and not all
of them. Note that responses 206 are still included in this count but it gives
a better figure, especially for places where authentication is used and 401 is
common.
2012-11-26 16:19:46 +01:00
William Lallemand
d300261bab MINOR: compression: disable on multipart or status != 200
The compression is disabled when the HTTP status code is not 200, indeed
compression on some HTTP code can create issues (ex: 206, 416).

Multipart message should not be compressed eitherway.
2012-11-26 16:02:58 +01:00
William Lallemand
859550e068 BUG/MINOR: compression: Content-Type is case insensitive
The Content-Type parameter must be case insensitive.
2012-11-26 16:02:58 +01:00
Willy Tarreau
f003d375ec BUG/MINOR: http: don't report client aborts as server errors
If a client aborts with an abortonclose flag, the close is forwarded
to the server and when server response is processed, the analyser thinks
it's the server who has closed first, and logs flags "SD" or "SH" and
counts a server error. In order to avoid this, we now first detect that
the client has closed and log a client abort instead.

This likely is the reason why many people have been observing a small rate
of SD/SH flags without being able to find what the error was.

This fix should probably be backported to 1.4.
2012-11-26 13:50:02 +01:00
Willy Tarreau
909d517e3f MINOR: cli: improve output format for show sess $ptr
This change removes pointers for known types (stream_interface, ...),
adds buffer pointers and sizes, and moves buffer information to their
own line. The output is cleaner with shorter lines and slightly more
lines.
2012-11-26 03:04:41 +01:00
Willy Tarreau
5f9a8779b3 BUG/MAJOR: cli: show sess <id> may randomly corrupt the back-ref list
show sess <id> puts a backref into the session it's dumping. If the output
is interrupted, the backref cannot always be removed because it's only done
in the I/O handler. This can randomly corrupt the backref list when the
session closes, because it passes the pointer to the next session which
itself might be watched.

The case is hard to reproduce (hundreds of attempts) but monitoring systems
might encounter it frequently.

Thus we have to add a release handler which does the cleanup even when the
I/O handler is not called.

This issue should also be present in 1.4 so the patch should be backported.
2012-11-26 02:22:40 +01:00
Willy Tarreau
7615366c70 MINOR: cli: add support for the "show sess all" command
Sometimes when debugging haproxy, it is important to take a full
snapshot of all sessions and their respective states. Till now it
was complicated to do because we had to use scripts and sessions
would vanish between two runs.

Now with this command we have the same output as "show sess $id"
but for all sessions in the table. This is a debugging command only,
it should only be used by developers as it is never guaranteed to
perfectly work !
2012-11-26 01:18:33 +01:00
Willy Tarreau
95898ac211 BUILD: buffer: fix another isprint() warning on solaris
This one came with commit recent be0efd8. Solaris wants ints, not chars.
2012-11-26 00:57:40 +01:00
Willy Tarreau
77e3af9e6f MINOR: tcp: add support for the "v4v6" bind option
Commit 9b6700f added "v6only". As suggested by Vincent Bernat, it is
sometimes useful to have the opposite option to force binding to the
two protocols when the system is configured to bind to v6 only by
default. This option does exactly this. v6only still has precedence.
2012-11-24 15:07:23 +01:00
Willy Tarreau
5e16cbc3bd MINOR: stats: report the total number of compressed responses per front/back
Depending on the content-types and accept-encoding fields, some responses
might or might not be compressed. Let's have a counter of the number of
compressed responses and report it in the stats to help improve compression
usage.

Some cosmetic issues were fixed in the CSV output too (missing commas at the
end).
2012-11-24 14:54:13 +01:00
Willy Tarreau
f149d8f21e MINOR: stats: also report the computed compression savings in html stats
It's interesting to know the average compression ratio obtained on
frontends and backends without having to compute it by hand, so let's
report it in the HTML stats.
2012-11-24 14:06:49 +01:00
Willy Tarreau
9b6700f673 MINOR: tcp: add support for the "v6only" bind option
This option forces a socket to bind to IPv6 only when it uses the
default address (eg: ":::80").
2012-11-24 12:20:28 +01:00
Willy Tarreau
e3635edc88 BUG/MEDIUM: connection: local_send_proxy must wait for connection to establish
The conn_local_send_proxy() function has to retrieve the local and remote
addresses, but the getpeername() and getsockname() functions may fail until
the connection is established. So now we catch this error and poll for write
when this happens.
2012-11-24 11:23:04 +01:00
Willy Tarreau
6c560da279 BUG/MEDIUM: checks: report handshake failures
Up to now, only data layer failures were reported to the task, but
if a handshake failed from the beginning, the error was not reported
as a failure.
2012-11-24 11:14:45 +01:00
Willy Tarreau
9a92cd5985 MINOR: connection: abort earlier when errors are detected
If an uncaught CO_FL_ERROR flag on a connection is detected, we
immediately go to the wakeup function. This ensures that even if
an error is asynchronously delivered, we don't risk re-enabling
polling or doing unexpected things in the handshake handlers.
2012-11-24 11:12:13 +01:00
Willy Tarreau
36fb02c526 BUG/MEDIUM: connection: always disable polling upon error
Commit 0ffde2cc in 1.5-dev13 tried to always disable polling on file
descriptors when errors were encountered. Unfortunately it did not
always succeed in doing so because it relied on detecting polling
changes to disable it. Let's use a dedicated conn_stop_polling()
function that is inconditionally called upon error instead.

This managed to stop a busy loop observed when a health check makes
use of the send-proxy protocol and fails before the connection can
be established.
2012-11-24 11:09:07 +01:00
Willy Tarreau
f0837b259b MEDIUM: tcp: add explicit support for delayed ACK in connect()
Commit 24db47e0 tried to improve support for delayed ACK upon connect
but it was incomplete, because checks with the proxy protocol would
always enable polling for data receive and there was no way of
distinguishing data polling and delayed ack.

So we add a distinct delack flag to the connect() function so that
the caller decides whether or not to use a delayed ack regardless
of pending data (eg: when send-proxy is in use). Doing so covers all
combinations of { (check with data), (sendproxy), (smart-connect) }.
2012-11-24 10:24:27 +01:00
Willy Tarreau
0eb2bed561 BUG/MINOR: stats: fix inversion of the report of a check in progress
Recent fix for health checks 5a78f36d inverted the condition to display
a "*" in front of the check status on the stats page.
2012-11-24 00:20:24 +01:00
Willy Tarreau
4a6e5c6d69 BUG/MEDIUM: acl: make prue_acl_expr() correctly free ACL expressions upon exit
When leaving, during the deinit() process, prune_acl_expr() is called to
delete all ACL expressions. A bug was introduced with commit 34db1084 that
caused every other expression argument to be skipped, and more annoyingly,
it introduced the risk of scanning past the arg list and crashing or
freezing the old process during a reload.

Credits for finding this issue go to Dmitry Sivachenko who first reported
it, and second did a lot of research to narrow it down to a minimal
configuration.
2012-11-24 00:02:14 +01:00
Willy Tarreau
7d1df41171 BUG/MEDIUM: acl: correctly resolve all args, not just the first one
Since 1.5-dev9, ACLs support multiple args. The changes performed in
acl_find_targets() were bogus as they were not always applied to the
current argument being processed, but sometimes to the first one only.

Fortunately till now, all ACLs which support resolvable arguments have
it in the first place only, so there was no impact.
2012-11-23 23:47:36 +01:00
Willy Tarreau
50de90a228 MINOR: listeners: make the accept loop more robust when maxaccept==0
If some listeners are mistakenly configured with 0 as the maxaccept value,
then we now consider them as limited to one accept() at a time. This will
avoid some issues as fixed by the past commit.
2012-11-23 20:22:10 +01:00
Willy Tarreau
ca57de3e7b BUG/MAJOR: peers: the listener's maxaccept was not set and caused loops
Recent commit 16a214 to move the maxaccept parameter to listeners didn't
set it on the peers' listeners, resulting in the value zero being used
there. This caused a busy loop for each peers section, because no incoming
connection could be accepted.

Thanks to Herv Commowick for reporting this issue.
2012-11-23 20:21:37 +01:00
Willy Tarreau
cfd97c6f04 BUG/MEDIUM: checks: prevent TIME_WAITs from appearing also on timeouts
We need to disable lingering before closing on timeout too, otherwise
we accumulate TIME_WAITs.
2012-11-23 17:35:59 +01:00
Willy Tarreau
2b199c9ac3 MEDIUM: connection: provide a common conn_full_close() function
Several places got the connection close sequence wrong because it
was not obvious. In practice we always need the same sequence when
aborting, so let's have a common function for this.
2012-11-23 17:32:21 +01:00
Willy Tarreau
db3b4a2891 MINOR: checks: fix recv polling after connect()
Commit a522f801 moved a call to __conn_data_want_recv() just after the
connect() call, which is not 100% correct. First, it does not take errors
into account, eventhough this is harmless. Second, this change will only
be taken into account after next call do conn_data_polling_update(), which
is not necessarily what is expected (eg: if an error is only reported on
the recv side).

So let's use conn_data_poll_recv() instead, which directly subscribes
the event to polling.
2012-11-23 16:32:33 +01:00
Willy Tarreau
b63b59641e BUG/MAJOR: checks: close FD on all timeouts
Since last commit, some timeouts were converted into an error to report
the status, and as a result, the socket was not closed because it was
supposed to have been done during the wake() call.

Close the socket as soon as the timeout is detected to fix the issue.
Also we now ensure to first initialize the connection flags.
2012-11-23 16:22:08 +01:00
Willy Tarreau
74fa7fbec9 MEDIUM: checks: close the socket as soon as we have a response
Until now, the check socked was closed in the task which handles the
check, which can sometimes be substantially later when many tasks are
running. It's much cleaner to close() in the wake call, which also
helps removing some FD management from the task itself.

The code is faster and smaller, and fast health checks show a more
predictable behaviour.
2012-11-23 14:43:49 +01:00
Willy Tarreau
24db47e0cc MEDIUM: checks: avoid waking the application up for pure TCP checks
Pure TCP checks only use the SYN/ACK in return to a SYN. By forcing
the system to use delayed ACKs, it is possible to send an RST instead
of the ACK and thus ensure that the application will never be needlessly
woken up. This avoids error logs or counters on checked components since
the application is never made aware of this connection which dies in the
network stack.
2012-11-23 14:18:39 +01:00
Willy Tarreau
acbdc7a760 BUG/MINOR: checks: slightly clean the state machine up
The process_chk() function still did not consider the the timeout when
it was woken up, so a spurious wakeup could trigger a false timeout. Some
checks were now redundant or could not be triggered (eg: L7 timeout).
So remove them and rearrange the timeout detection.
2012-11-23 14:02:57 +01:00
Willy Tarreau
5a78f36db3 MAJOR: checks: rework completely bogus state machine
The porting of checks to using connections was totally bogus. Some checks
were considered successful as soon as the connection was established,
regardless of any response. Some errors would be triggered upon recv
if polling was enabled for send or if the send channel was shut down.

Now the behaviour is much better. It would be cleaner to perform the
fd_delete() in wake_srv_chk() and to process failures and timeouts
separately, but this is already a good start.
2012-11-23 12:47:05 +01:00
Willy Tarreau
d3aac7088e CLEANUP: checks: rename some server check flags
Some server check flag names were not properly choosen and cause
analysis trouble, especially the CHK_RUNNING one which does not
mean that a check is running but that the server is running...

Here's the rename :
  CHK_RUNNING -> CHK_PASSED
  CHK_ERROR   -> CHK_FAILED
2012-11-23 11:32:12 +01:00
Willy Tarreau
e6d9702e7e MINOR: cli: report the msg state in full text in "show sess $PTR"
It's more convenient to debug with real state names.
2012-11-23 11:31:56 +01:00
William Lallemand
be0efd884d MINOR: buffer_dump with ASCII
Improve the buffer_dump function with ASCII output.
2012-11-23 11:13:16 +01:00
William Lallemand
00bf1dee9c BUG/MEDIUM: compression: does not forward trailers
The commit bf3ae617 introduced a regression about the forward of the
trailers in compression mode.
2012-11-23 11:12:33 +01:00
Willy Tarreau
fd29cc537b MEDIUM: checks: avoid accumulating TIME_WAITs during checks
Some checks which do not induce a close from the server accumulate
local TIME_WAIT sockets because they're cleanly shut down. Typically
TCP probes cause this. This is very problematic when there are many
servers, when the checks are fast or when local source ports are rare.

So now we'll disable lingering on the socket instead of sending a
shutdown. Before doing this we try to drain any possibly pending data.
That way we avoid sending an RST when the server has closed first.

This change means that some servers will see more RSTs, but this is
needed to avoid local source port starvation.
2012-11-23 09:18:20 +01:00
Willy Tarreau
ef8a719f70 BUG/MINOR: checks: don't mark the FD as closed before transport close
Some future transport layers might need the connection's file descriptor
on ->close(), so we must not destroy it before we're finished with it.
2012-11-23 09:05:05 +01:00
Willy Tarreau
a522f801fb BUG/MEDIUM: checks: ensure we completely disable polling upon success
When a check succeeds, it used to only disable receive events while
it should disable both directions. The problem is that if the send
event was reported too, it could re-enable the recv event. In theory
this is not a problem as the task is going to be woken up, but if
there are many tasks in the queue and this task is not processed
immediately, we could theorically face a storm of unprocessed events
(typically POLL_HUP).

So better stop both directions, prevent the send side from enabling
recv and have the process_chk() code enable both directions. This
will also help detecting closes before the check is sent.

Note that all this mess has been inherited from the old code that used
the fd as a flag to report if a check was running. We should have a
dedicated flag and perform the fd_delete() in wake_srv_chk() instead.
2012-11-23 09:03:59 +01:00
Willy Tarreau
6b0a850503 BUG/MEDIUM: checks: mark the check as stopped after a connect error
Health checks currently still use the connection's fd to know whether
a check is running (this needs to change). When a health check
immediately fails during connect() because of a lack of local resource
(eg: port), we failed to unset the fd, so each time the process_chk
woken up after such an error, it believed a check was still running
and used to close the fd again instead of starting a new check. This
could result in other connections being closed because they were
assigned the same fd value.

The bug is only marked medium because when this happens, the system
is already in a bad state.

A comment was added above tcp_connect_server() to clarify that the
fd is *not* valid on error.
2012-11-23 09:03:29 +01:00
Willy Tarreau
55058a7c1e MINOR: stats: report HTTP compression stats per frontend and per backend
It was a bit frustrating to have no idea about the bandwidth saved by
HTTP compression. Now we have per-frontend and per-backend stats. The
stats on the HTTP interface are shown in a hover title in the "bytes out"
column if at least something was fed to the compressor. 3 new columns
appeared in the CSV stats output.
2012-11-22 01:07:40 +01:00
Willy Tarreau
83d84cfc8a BUILD: silence a warning on Solaris about usage of isdigit()
On Solaris, isdigit() is a macro and it complains about the use of
a char instead of the int for the argument. Let's cast it to an int
to silence it.
2012-11-22 01:04:31 +01:00
Willy Tarreau
193b8c6168 MINOR: http: allow the cookie capture size to be changed
Some users need more than 64 characters to log large cookies. The limit
was set to 63 characters (and not 64 as previously documented). Now it
is possible to change this using the global "tune.http.cookielen" setting
if required.
2012-11-22 00:44:27 +01:00
Willy Tarreau
f9fbfe8229 BUG/MAJOR: stream_interface: read0 not always handled since dev12
The connection handling changed introduced in 1.5-dev12 introduced a
regression with commit 9bf9c14c. The issue is that the stream_sock_read0()
callback must update the channel flags to indicate that the side is closed
so that when process_session() is called, it can propagate the close to the
other side and terminate the session.

The issue only appears in HTTP tunnel mode. It's a bit tricky to trigger
the issue, it requires that the request channel is full with data flowing
from the client to the server and that both the response and the read0()
are received at once so that the flags are not updated, and that the HTTP
analyser switches to tunnel mode without being informed that the request
write side is closed. After that, process_session() does not know that the
connection has to be aborted either, and no more event appears on this side
where the connection stays here forever.

Many thanks to Igor at owind for testing several snapshots and for providing
valuable traces to reproduce and diagnose the issue!
2012-11-21 21:59:51 +01:00
Willy Tarreau
85d47f9d98 MINOR: cli: report an error message on missing argument to compression rate
"set rate-limit http-compression global" needs an integer and must
complain when it's not there.
2012-11-21 02:15:16 +01:00
William Lallemand
072a2bf537 MINOR: compression: CPU usage limit
New option 'maxcompcpuusage' in global section.
Sets the maximum CPU usage HAProxy can reach before stopping the
compression for new requests or decreasing the compression level of
current requests.  It works like 'maxcomprate' but with the Idle.
2012-11-21 02:15:16 +01:00
William Lallemand
c71407657d BUG/MINOR: compression: dynamic level increase
Using compression rate limit, the compression level wasn't taking care
of the max compression level during a session because the test was done
on the wrong variable.
2012-11-21 02:15:16 +01:00
William Lallemand
e3a7d99062 MINOR: compression: report zlib memory usage
Show the memory usage and the max memory available for zlib.
The value stored is now the memory used instead of the remaining
available memory.
2012-11-21 02:15:16 +01:00
William Lallemand
096f554ee1 MINOR: compression: rate limit in 'show info'
Show the compression rate limit 'CompressRateLim' in bytes per second on
the UNIX socket.
2012-11-21 01:58:11 +01:00
William Lallemand
8b52bb3878 MEDIUM: compression: use pool for comp_ctx
Use pool for comp_ctx, it is allocated during the comp_algo->init().
The allocation of comp_ctx is accounted for in the zlib_memory_available.
2012-11-21 01:56:47 +01:00