In preparation for newer balance algorithms, group the
sparse PR_O_BALANCE_* values into layer4 and layer7-based
algorithms. This will ease addition of newer algorithms.
This patch adds the "maxqueue" parameter to the server. This allows new
sessions to be immediately rebalanced when the server's queue is filled.
It's useful when session stickiness is just a performance boost (even a
huge one) but not a requirement.
This should only be used if session affinity isn't a hard functional
requirement but provides performance boost by keeping server-local
caches hot and compact).
Absence of 'maxqueue' option means unlimited queue. When queue gets filled
up to 'maxqueue' client session is moved from server-local queue to a global
one.
Hello,
This patch implements new statistics for SLA calculation by adding new
field 'Dwntime' with total down time since restart (both HTTP/CSV) and
extending status field (HTTP) or inserting a new one (CSV) with time
showing how long each server/backend is in a current state. Additionaly,
down transations are also calculated and displayed for backends, so it is
possible to know how many times selected backend was down, generating "No
server is available to handle this request." error.
New information are presentetd in two different ways:
- for HTTP: a "human redable form", one of "100000d 23h", "23h 59m" or
"59m 59s"
- for CSV: seconds
I believe that seconds resolution is enough.
As there are more columns in the status page I decided to shrink some
names to make more space:
- Weight -> Wght
- Check -> Chk
- Down -> Dwn
Making described changes I also made some improvements and fixed some
small bugs:
- don't increment s->health above 's->rise + s->fall - 1'. Previously it
was incremented an then (re)set to 's->rise + s->fall - 1'.
- do not set server down if it is down already
- do not set server up if it is up already
- fix colspan in multiple places (mostly introduced by my previous patch)
- add missing "status" header to CSV
- fix order of retries/redispatches in server (CSV)
- s/Tthen/Then/
- s/server/backend/ in DATA_ST_PX_BE (dumpstats.c)
Changes from previous version:
- deal with negative time intervales
- don't relay on s->state (SRV_RUNNING)
- little reworked human_time + compacted format (no spaces). If needed it
can be used in the future for other purposes by optionally making "cnt"
as an argument
- leave set_server_down mostly unchanged
- only little reworked "process_chk: 9"
- additional fields in CSV are appended to the rigth
- fix "SEC" macro
- named arguments (human_time, be_downtime, srv_downtime)
Hope it is OK. If there are only cosmetic changes needed please fill free
to correct it, however if there are some bigger changes required I would
like to discuss it first or at last to know what exactly was changed
especially since I already put this patch into my production server. :)
Thank you,
Best regards,
Krzysztof Oledzki
It is important to know how your installation performs. Haproxy masks
connection errors, which is extremely good for a client but it is bad for
an administrator (except people believing that "ignorance is a bless").
Attached patch adds retries and redispatches counters, so now haproxy:
1. For server:
- counts retried connections (masked or not)
2. For backends:
- counts retried connections (masked or not) that happened to
a slave server
- counts redispatched connections
- does not count successfully redispatched connections as backend errors.
Errors are increased only when client does not get a valid response,
in other words: with failed redispatch or when this function is not
enabled.
3. For statistics:
- display Retr (retries) and Redis (redispatches) as a "Warning"
information.
Removed old unused MODE_LOG and MODE_STATS, and replaced the "stats"
keyword in the global section. The new "stats" keyword in the global
section is used to create a UNIX socket on which the statistics will
be accessed. The client must issue a "show stat\n" command in order
to get a CSV-formated output similar to the output on the HTTP socket
in CSV mode.
It is now possible to get CSV ouput from the statistics by
simply appending ";csv" to the HTTP request sent to get the
stats. The fields keep the same ordering as in the HTML page,
and a field "pxname" has been prepended at the beginning of
the line.
A new file, proto_uxst.c, implements support of PF_UNIX sockets
of type SOCK_STREAM. It relies on generic stream_sock_read/write
and uses its own accept primitive which also tries to be generic.
Right now it only implements an echo service in sight of a general
support for start dumping via unix socket. The echo code is more
of a proof of concept than useful code.
A new generic protocol mechanism has been added. It provides
an easy method to implement new protocols with different
listeners (eg: unix sockets).
The listeners are automatically started at the right moment
and enabled after the possible fork().
The stream_sock_* functions had to know about sessions just in
order to get the server's address for a connect() operation. This
is not desirable, particularly for non-IP protocols (eg: PF_UNIX).
Put a pointer to the peer's sockaddr_storage or sockaddr address
in the fdtab structure so that we never need to look further.
With this small change, the stream_sock.c file is now 100% protocol
independant.
When one server appears at the same position in multiple backends, it
receives all the checks from all the backends exactly at the same time
because the health-checks are only spread within a backend but not
globally.
Attached patch implements per-server start delay in a different way.
Checks are now spread globally - not locally to one backend. It also makes
them start faster - IMHO there is no need to add a 'server->inter' when
calculating first execution. Calculation were moved from cfgparse.c to
checks.c. There is a new function start_checks() and now it is not called
when haproxy is started in MODE_CHECK.
With this patch it is also possible to set a global 'spread-checks'
parameter. It takes a percentage value (1..50, probably something near
5..10 is a good idea) so haproxy adds or removes that many percent to the
original interval after each check. My test shows that with 18 backends,
54 servers total and 10000ms/5% it takes about 45m to mix them completely.
I decided to use rand/srand pseudo-random number generator. I am aware it
is not recommend for a good randomness but a) we do not need a good random
generator here b) it is probably the most portable one.
The following patch will give the ability to tweak socket linger mode.
You can use this option with "option nolinger" inside fronted or backend
configuration declaration.
This will help in environments where lots of FIN_WAIT sockets are
encountered.
src/chtbl.c, src/hashpjw.c and src/list.c are distributed under
an obscure license. While Aleks and I believe that this license
is OK for haproxy, other people think it is not compatible with
the GPL.
Whether it is or not is not the problem. The fact that it rises
a doubt is sufficient for this problem to be addressed. Arnaud
Cornet rewrote the unclear parts with clean GPLv2 and LGPL code.
The hash algorithm has changed too and the code has been slightly
simplified in the process. A lot of care has been taken in order
to respect the original API as much as possible, including the
LGPL for the exportable parts.
The new code has not been thoroughly tested but it looks OK now.
The stats page now supports an option to hide servers which are DOWN
and to enable/disable automatic refresh. It is also possible to ask
for an immediate refresh.
When a very large number of servers is configured (thousands),
shutting down many of them at once could lead to large number
of calls to recalc_server_map() which already takes some time.
This would result in an O(N^3) computation time, leading to
noticeable pauses on slow embedded CPUs on test platforms.
Instead, mark the map as dirty and recalc it only when needed.
The new "use_backend" keyword permits full content switching by the
use of ACLs. Its usage is simple :
use_backend <backend_name> {if|unless} <acl_cond>
Implemented the "-i" option on ACLs to state that the matching
will have to be performed for all patterns ignoring case. The
usage is :
acl <aclname> <aclsubject> -i pattern1 ...
If a pattern must begin with "-", either it must not be the first one,
or the "--" option should be specified first.
hdr(x), hdr_reg(x), hdr_beg(x), hdr_end(x), hdr_sub(x), hdr_dir(x),
hdr_dom(x), hdr_cnt(x) and hdr_val(x) have been implemented. They
apply to any of the possibly multiple values of header <x>.
Right now, hdr_val() is limited to integer matching, but it should
reasonably be upgraded to match long long ints.
Some fetches such as 'line' or 'hdr' need to know the direction of
the test (request or response). A new 'dir' parameter is now
propagated from the caller to achieve this.
ACLs now support operators such as 'eq', 'le', 'lt', 'ge' and 'gt'
in order to give more flexibility to the language. Because of this
change, the 'dst_limit' keyword changed to 'dst_conn' and now requires
either a range or a test such as 'dst_conn lt 1000' which is more
understandable.
By default, epoll/kqueue used to return as many events as possible.
This could sometimes cause huge latencies (latencies of up to 400 ms
have been observed with many thousands of fds at once). Limiting the
number of events returned also reduces the latency by avoiding too
many blind processing. The value is set to 200 by default and can be
changed in the global section using the tune.maxpollevents parameter.
A second occurrence of read-timeout rearming was present in stream_sock.c.
To fix the problem, it was necessary to put the shutdown information in
the buffer (already planned).
The timeout functions were difficult to manipulate because they were
rounding results to the millisecond. Thus, it was difficult to compare
and to check what expired and what did not. Also, the comparison
functions were heavy with multiplies and divides by 1000. Now, all
timeouts are stored in timevals, reducing the number of operations
for updates and leading to cleaner and more efficient code.
Peter van Dijk contributed this patch which implements the "smtpchk"
option, which is to SMTP what "httpchk" is to HTTP. By default, it sends
"HELO localhost" to the servers, and waits for the 250 message, but it
can also send a specific request.
The new 'block' keyword makes it possible to block a request based on
ACL test results. Block accepts two optional arguments : 'if' <cond>
and 'unless' <cond>.
The request will be blocked with a 403 response if the condition is validated
(if) or if it is not (unless). Do not rely on this one too much, as it's more
of a proof of concept helping in developing other matches.
This framework offers all other subsystems the ability to register
ACL matching criteria. Some generic matching functions are already
provided. Others will come soon and the framework shall evolve.
There are multiple places where the client's destination address is
required. Let's store it in the session when needed, and add a flag
to inform that it has been retrieved.
The rbtree-based wait queue consumes a lot of CPU. Use the ul2tree
instead. Lots of cleanups and code reorganizations made it possible
to reduce the task struct and simplify the code a bit.
The principle behind speculative I/O is to speculatively try to
perform I/O before registering the events in the system. This
considerably reduces the number of calls to epoll_ctl() and
sometimes even epoll_wait(), and manages to increase overall
performance by about 10%.
The new poller has been called "sepoll". It is used by default
on Linux when it works. A corresponding option "nosepoll" and
the command line argument "-ds" allow to disable it.
Now fdtab can contain the FD_POLL_* events so that the pollers
which can fill them can give userful information to readers and
writers about the precise condition of wakeup.
Some pollers such as kqueue lose their FD across fork(), meaning that
the registered file descriptors are lost too. Now when the proxies are
started by start_proxies(), the file descriptors are not registered yet,
leaving enough time for the fork() to take place and to get a new pollfd.
It will be the first call to maintain_proxies that will register them.