In preparation for moving the POST processing to the applet, we first
add new states to the HTTP I/O handler. Till now st0 was only 0/1 for
start/end. We now replace it with an enum.
This field was used by dumpstats to retrieve a pointer to the current
session, which may already be found from ->owner. With this change,
the stats code doesn't need the connection at all anymore.
We're trying to move the applets out of the struct connection. So
let's remove the dependence on xprt_st and introduce si->applet.st2
to store the missing contextual data instead.
We now have to report 2 conflicting information on the stats page :
- NOLB = server which returns 404 and stops load balancing ;
- DRAIN = server with a weight forced to zero
The DRAIN state was previously detected from eweight==0 and represented in
blue so that a temporarily disabled server was noticed. This was done by
commit cc8bb92 (MINOR: stats: show soft-stopped servers in different color).
This choice suffered from a small defect however, which is that a server
with a zero weight was reported in this color whatever its state (even down
or switching).
Also, one of the motivations for the color above was because the NOLB state
is barely detectable as it's very close to the UP state.
Since commit 8c3d0be (MEDIUM: Add DRAIN state and report it on the stats page),
we have the new DRAIN state to show servers with a zero weight. The colors are
unfortunately very close to those of the MAINT state, and some users were
confused by the disappearance of the blue bars.
Additionally, the NOLB state had precedence over DRAIN, which could be an
issue since DRAIN is the only thing the admin can act on, so once NOLB was
shown, there was nothing to indicate that the weight was forced to zero.
By switching the two priorities we can report DRAIN (forced mode) before
NOLB (detected mode).
The best solution to fix all this is to reuse the previous blue color for
all cases where weight == 0, whether it's set by config / agent / cli (DRAIN)
or detected by a 404 response (NOLB). However we only use this color when the
server is 100% UP. If it's going down we switch to the usual yellow color
showing failed checks, and when it's down it keeps its usual red color.
That way, a blue bar on the display indicates a server not taking new
sessions but perfectly up. And other colors keep their usual meaning.
When a server tracks another one, its state on the stats page always reports
"via xx/yy". That's convenient to know what server to act on to change the
state. But it is also possible to force the tracking server itself into
maintenance mode and in this case we should not report "via xx/yy" because
the tracked server can't do anything to change the server's state, which
is confusing. In practice there is nothing wrong in leaving it as-is,
except that it's highly misleading when looking at the stats page.
Note that we only change the HTML output, not the CSV one. The states are
already different : "MAINT" vs "MAINT(via)" and we expect anyone coding a
monitoring system based on the CSV output to know the differences between
all possible states.
Add a DRAIN sub-state for a server which
will be shown on the stats page instead of UP if
its effective weight is zero.
Also, log if a server enters or leaves the DRAIN state
as the result of an agent check.
Signed-off-by: Simon Horman <horms@verge.net.au>
The syntax of this new commands are:
enable agent <backend>/<server>
disable agent <backend>/<server>
These commands allow temporarily stopping and subsequently
re-starting an auxiliary agent check. The effect of this is as follows:
New checks are only initialised when the agent is in the enabled. Thus,
disable agent will prevent any new agent checks from begin initiated until
the agent re-enabled using enable agent.
When an agent is disabled the processing of an auxiliary agent check that
was initiated while the agent was set as enabled is as follows: All
results that would alter the weight, specifically "drain" or a weight
returned by the agent, are ignored. The processing of agent check is
otherwise unchanged.
The motivation for this feature is to allow the weight changing effects
of the agent checks to be paused to allow the weight of a server to be
configured using set weight without being overridden by the agent.
Signed-off-by: Simon Horman <horms@verge.net.au>
This is achieved by moving rise and fall from struct server to struct check.
After this move the behaviour of the primary check, server->check is
unchanged. However, the secondary agent check, server->agent now has
independent rise and fall values each of which are set to 1.
The result is that receiving "fail", "stopped" or "down" just once from the
agent will mark the server as down. And receiving a weight just once will
allow the server to be marked up if its primary check is in good health.
This opens up the scope to allow the rise and fall values of the agent
check to be configurable, however this has not been implemented at this
stage.
Signed-off-by: Simon Horman <horms@verge.net.au>
The column used to report the throttle percentage when a server is in
slowstart is based on the time only. This is wrong, because server weights
in slowstart are updated at most once a second, so the reported value is
wrong at least fo rone second during each step, which means all the time
when using short delays (< 20s).
The second point is that it's disturbing to see a weight < 100% without
any throttle at the end of the period (during the last second), because
the effective weight has not yet been updated.
Instead, we now compute the exact ratio between eweight and uweight and
report it. It's always accurate and describes the value being used instead
of using only the date.
It can be backported to 1.4 though it's not particularly important.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
Signed-off-by: Simon Horman <horms@verge.net.au>
Add state to struct check. This is currently used to store one bit,
CHK_RUNNING, which is set if a check is running and clear otherwise.
This bit was previously SRV_CHK_RUNNING of the state element of struct
server.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Paramatise the following functions over the check of a server
* set_server_down
* set_server_up
* srv_getinter
* server_status_printf
* set_server_check_status
* set_server_disabled
* set_server_enabled
Generally the server parameter of these functions has been removed.
Where it is still needed it is obtained using check->server.
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
By paramatising these functions they may act on each of the checks
without further significant modification.
Explanation of the SSP_O_HCHK portion of this change:
* Prior to this patch SSP_O_HCHK serves a single purpose which
is to tell server_status_printf() weather it should print
the details of the check of a server or not.
With the paramatisation that this patch adds there are two cases.
1) Printing the details of the check in which case a
valid check parameter is needed.
2) Not printing the details of the check in which case
the contents check parameter are unused.
In case 1) we could pass SSP_O_HCHK and a valid check and;
In case 2) we could pass !SSP_O_HCHK and any value for check
including NULL.
If NULL is used for case 2) then SSP_O_HCHK becomes supurfulous
and as NULL is used for case 2) SSP_O_HCHK has been removed.
Signed-off-by: Simon Horman <horms@verge.net.au>
Mark Brooks reported the following issue :
"My table looks like this -
0x24a8294: key=192.168.136.10 use=0 exp=1761492 server_id=3
0x24a8344: key=192.168.136.11 use=0 exp=1761506 server_id=2
0x24a83f4: key=192.168.136.12 use=0 exp=1761520 server_id=3
0x24a84a4: key=192.168.136.13 use=0 exp=1761534 server_id=2
0x24a8554: key=192.168.136.14 use=0 exp=1761548 server_id=3
0x24a8604: key=192.168.136.15 use=0 exp=1761563 server_id=2
0x24a86b4: key=192.168.136.16 use=0 exp=1761580 server_id=3
0x24a8764: key=192.168.136.17 use=0 exp=1761592 server_id=2
0x24a8814: key=192.168.136.18 use=0 exp=1761607 server_id=3
0x24a88c4: key=192.168.136.19 use=0 exp=1761622 server_id=2
0x24a8974: key=192.168.136.20 use=0 exp=1761636 server_id=3
0x24a8a24: key=192.168.136.21 use=0 exp=1761649 server_id=2
im running the command -
socat unix-connect:/var/run/haproxy.stat stdio <<< 'clear table VIP_Name-2 data.server_id eq 2'
Id assume that the entries with server_id = 2 would be removed but its
removing everything each time."
The cause of the issue is a missing test for skip_entry when deciding
whether to clear the key or not. The test was present when only the
last node is to be removed, so removing only the first node from a
list of two always did the right thing, explaining why it remained
unnoticed in basic unit tests.
The bug was introduced by commit 8fa52f4e which attempted to fix a
previous issue with this feature where only the last node was removed.
This bug is 1.5-specific and does not require any backport.
The "set table" statement allows to create new entries with their respective
values. Till now it was limited to a single data type per line, requiring as
many "set table" statements as the desired data types to be set. Since this
is only a parser limitation, this patch gets rid of it. It also allows the
creation of a key with no data types (all reset to their default values).
Since commit 654694e1, it has been possible to feed some data into
stick tables from the CLI. That commit considered that frequency
counters would only have their previous value set, so that they
progressively fade out. But this does not match any real world
use case in fact. The only reason for feeding a freq counter is
to pass some data learned outside. We certainly don't want to see
such data start to vanish immediately, otherwise it will force the
external scripts to loop very frequently to limit the losses.
So let's set the current value instead in order to guarantee that
the data remains stable over the full period, then starts to fade
out between 1* and 2* the period.
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.
The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.
The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.
One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.
A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
Bryan Talbot reported that a config with only "stats bind-process 1" in
the global section would crash during parsing. This bug was introduced
with this new statement in 1.5-dev13. The stats frontend must be allocated
if it was not yet.
No backport is needed. The workaround consists in having previously declared
any other stats keyword (generally stats socket).
A "soft-stopped" server (weight 0) can be missed easily in the
webinterface. Fix this by using a specific class (and color).
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
When a server state change happens, a status bar appears with a link. Since
commit 1.5-dev16 (commit e7dbfc66), it was not visible anymore because of
the CSS properties set on the whole "div" tag used by the tooltips. Fix this
by using a specific class for the tooltips.
The confirmation link on the web interface forgets to set the displaying
options, so that when one clicks on "[X] Action processed successfully",
the auto-refresh, up/down and scope search settings are lost. We need to
pass all these info in the new link.
commit 88c278fadf provided a new input field on the statistics page which was
focused by default. The autofocus prevent to scroll down the page with the
keyboard immediately afterloading it, without the need to leave that field.
It also interfered with "stats refresh" by scrolling up to this field on each
refresh.
This small patch removes the autofocus keyword and adds a tabindex="1" as
suggested by Vincent Bernat. It also cleans up the generated html code.
This patch adds a "scope" box in the statistics page in order to
display only proxies with a name that contains the requested value.
The scope filter is preserved across all clicks on the page.
Will Glass-Husain reported this breakage which was caused by commit
dec9814e merged in 1.5-dev12, and which was incomplete, resulting in
both "clear table" and "show table" to do the same thing.
No backport is needed.
Break out set weight processing code.
This is in preparation for reusing the code.
Also, remove duplicate check in nested if clauses.
{px->lbprm.algo & BE_LB_PROP_DYN) is checked by
the immediate outer if clause, so there is no need
to check it a second time.
Signed-off-by: Simon Horman <horms@verge.net.au>
Currently s->listener is set for all sessions, but this may not remain
the case forever so we already check s->listener for validity. On check
was missed.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
These macros (U2H, U2A, LIM2A, ...) have been used with an explicit
index for the local storage variable, making it difficult to change
log formats and causing a few issues from time to time. Let's have
a single macro with a rotating index so that up to 10 conversions
may be used in a single call.
Frontend, listener, server and backend detailed counters are now spread
over several lines in a table when the pointer hovers over the <u> area.
The values are much more readble, and the extra space gained this way
allowed to report some percentages.
Note: some incoherencies still exist between some counters. For example,
the backend's cum_conn is increased when a session is assigned instead
of increasing cum_sess.
Using titles to report detailed information is not convenient. The
browser decides to wrap the line where it wants, generally the box
quickly fades away, and it's not possible to copy-paste the text
from the box.
By using two levels, we can make a block appear/disappear depending
on whether its parent it being hovered or not, for example :
.tip { display: none; }
u:hover .tip { display: block; }
or better:
.tip { display: block; visibility: hidden; }
u:hover .tip { visibility: visible; }
Toggling visibility ensures that the place required to display the block
is always reserved. This is important to display boxes that are close to
the edges.
Then using <span class="tip">this is a box</span> will make the text
appear only when the upper <u> is hovered.
But this still adds much text. So instead we use a generic <div> tag
which we don't use anywhere else. That way we don't have to specify a
class :
div { display: block; visibility: hidden; }
u:hover div { visibility: hidden; }
This works pretty well, even in old browsers from 2005.
This commit does not change the display format, it only replaces the
title attribute with the div tag. Later commits will adjust the layout.
All the functions that were called "http" or "raw" have been cleaned up
to support either only HTML and use that word in their name, or support
both HTML and CSV and be usable both from the HTTP and the CLI entry
points.
stats_http_dump() now explicitly checks for HTML mode before adding
HTML-specific headers/trailers, and has been renamed since it's now
used by both entry points.
Some more duplicated code could be removed. The patch looks big but it's
mostly due to re-indents resulting from the moves or comments updates.
The calling sequences now look like this :
cli_io_handler()
-> stats_dump_sess_to_buffer() // "show sess"
-> stats_dump_errors_to_buffer() // "show errors"
-> stats_dump_info_to_buffer() // "show info"
-> stats_dump_stat_to_buffer() // "show stat"
-> stats_dump_csv_header()
-> stats_dump_proxy_to_buffer()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
http_stats_io_handler()
-> stats_dump_stat_to_buffer() // same as above, but used for CSV or HTML
-> stats_dump_csv_header() // emits the CSV headers (same as above)
-> stats_dump_html_head() // emits the HTML headers
-> stats_dump_html_info() // emits the equivalent of "show info" at the top
-> stats_dump_proxy_to_buffer() // same as above, valid for CSV and HTML
-> stats_dump_html_px_hdr()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
-> stats_dump_html_px_end()
-> stats_dump_html_end() // emits HTML trailer
The HTTP header injection that are performed in dumpstats when responding
or when redirecting a POST request have nothing to do in dumpstats. They
do not use any state from the stats, and are 100% HTTP. Let's make the
headers there in the HTTP core, and have dumpstats only produce stats.
The dumpstats code looks like a spaghetti plate. Several functions are
supposed to be able to do several things but rely on complex states to
dispatch the work to independant functions. Most of the HTML output is
performed within the switch/case statements of the whole state machine.
Let's clean this up by adding new functions to emit the data and have
a few more iterators to avoid relying on so complex states.
The new stats dump sequence looks like this for CLI and for HTTP :
cli_io_handler()
-> stats_dump_sess_to_buffer() // "show sess"
-> stats_dump_errors_to_buffer() // "show errors"
-> stats_dump_raw_info_to_buffer() // "show info"
-> stats_dump_raw_info()
-> stats_dump_raw_stat_to_buffer() // "show stat"
-> stats_dump_csv_header()
-> stats_dump_proxy()
-> stats_dump_px_hdr()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
-> stats_dump_px_end()
http_stats_io_handler()
-> stats_http_redir()
-> stats_dump_http() // also emits the HTTP headers
-> stats_dump_html_head() // emits the HTML headers
-> stats_dump_csv_header() // emits the CSV headers (same as above)
-> stats_dump_http_info() // note: ignores non-HTML output
-> stats_dump_proxy() // same as above
-> stats_dump_http_end() // emits HTML trailer
Benjamin Polidore reported a build issue on Solaris with gcc 4.2.4 where
stdbool is not usable without c99. It only appeared at one location in
dumpstats and is totally useless, let's use the more common and portable
int as everywhere else.
Commit 5730c68b changed to display compression ratios based on 2xx
responses, but we should then check that there are such responses
instead of checking for requests. The risk is a divide error if there
are some requests but no 2xx yet (eg: redirect).
Since only responses with status 200 can be compressed, let's only count the
ratio of compressed responses on the basis of the 2xx responses and not all
of them. Note that responses 206 are still included in this count but it gives
a better figure, especially for places where authentication is used and 401 is
common.
This change removes pointers for known types (stream_interface, ...),
adds buffer pointers and sizes, and moves buffer information to their
own line. The output is cleaner with shorter lines and slightly more
lines.
show sess <id> puts a backref into the session it's dumping. If the output
is interrupted, the backref cannot always be removed because it's only done
in the I/O handler. This can randomly corrupt the backref list when the
session closes, because it passes the pointer to the next session which
itself might be watched.
The case is hard to reproduce (hundreds of attempts) but monitoring systems
might encounter it frequently.
Thus we have to add a release handler which does the cleanup even when the
I/O handler is not called.
This issue should also be present in 1.4 so the patch should be backported.
Sometimes when debugging haproxy, it is important to take a full
snapshot of all sessions and their respective states. Till now it
was complicated to do because we had to use scripts and sessions
would vanish between two runs.
Now with this command we have the same output as "show sess $id"
but for all sessions in the table. This is a debugging command only,
it should only be used by developers as it is never guaranteed to
perfectly work !
Depending on the content-types and accept-encoding fields, some responses
might or might not be compressed. Let's have a counter of the number of
compressed responses and report it in the stats to help improve compression
usage.
Some cosmetic issues were fixed in the CSV output too (missing commas at the
end).
It's interesting to know the average compression ratio obtained on
frontends and backends without having to compute it by hand, so let's
report it in the HTML stats.
The porting of checks to using connections was totally bogus. Some checks
were considered successful as soon as the connection was established,
regardless of any response. Some errors would be triggered upon recv
if polling was enabled for send or if the send channel was shut down.
Now the behaviour is much better. It would be cleaner to perform the
fd_delete() in wake_srv_chk() and to process failures and timeouts
separately, but this is already a good start.
It was a bit frustrating to have no idea about the bandwidth saved by
HTTP compression. Now we have per-frontend and per-backend stats. The
stats on the HTTP interface are shown in a hover title in the "bytes out"
column if at least something was fed to the compressor. 3 new columns
appeared in the CSV stats output.