Consistent hashing provides some interesting advantages over common
hashing. It avoids full redistribution in case of a server failure,
or when expanding the farm. This has a cost however, the hashing is
far from being perfect, as we associate a server to a request by
searching the server with the closest key in a tree. Since servers
appear multiple times based on their weights, it is recommended to
use weights larger than approximately 10-20 in order to smoothen
the distribution a bit.
In some cases, playing with weights will be the only solution to
make a server appear more often and increase chances of being picked,
so stats are very important with consistent hashing.
In order to indicate the type of hashing, use :
hash-type map-based (default, old one)
hash-type consistent (new one)
Consistent hashing can make sense in a cache farm, in order not
to redistribute everyone when a cache changes state. It could also
probably be used for long sessions such as terminal sessions, though
that has not be attempted yet.
More details on this method of hashing here :
http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
Recent "struct chunk rework" introduced a NULL pointer dereference
and now haproxy segfaults if auth is required for stats but not found.
The reason is that size_t cannot store negative values, but current
code assumes that "len < 0" == uninitialized.
This patch fixes it.
There are a few remaining max values that need to move to counters.
Also, the counters are more often used than some config information,
so get them closer to the other useful struct members for better cache
efficiency.
This patch allows to collect & provide separate statistics for each socket.
It can be very useful if you would like to distinguish between traffic
generate by local and remote users or between different types of remote
clients (peerings, domestic, foreign).
Currently no "Session rate" is supported, but adding it should be possible
if we found it useful.
Doing this, we can remove the last BF_HIJACK user and remove
produce_content(). s->data_source could also be removed but
it is currently used to detect if the stats or a server was
used.
The stats handler used to store internal states in s->ana_state. Now
we only rely on si->st0 in which we can store as many states as we
have possible outputs. This cleans up the stats code a lot and makes
it more maintainable. It has also reduced code size by a few hundred
bytes.
We can simplify the code in the stats functions using buffer_feed_chunk()
instead of buffer_write_chunk(). Let's start with this function. This
patch also fixed an issue where we could dump past the end of the capture
buffer if it is shorter than the captured request.
Calling buffer_shutw() marks the buffer as closed but if it was already
closed in the other direction, the stream interface is not marked as
closed, causing infinite loops.
We took this opportunity to completely remove buffer_shutw() and buffer_shutr()
which have no reason to be used at all and which will always cause trouble
when directly called. The stats occurrence was the last one.
We need to remove hash map accesses out of backend.c if we want to
later support new hash methods. This patch separates the hash computation
method from the server lookup. It leaves the lookup function to lb_map.c
and calls it with the result of the hash.
It was becoming painful to have all the LB algos in backend.c.
Let's move them to their own files. A few hashing functions still
need be broken in two parts, one for the contents and one for the
map position.
There is no reason to inline functions which are used to grab a server
depending on an LB algo. They are large and used at several places.
Uninlining them saves 400 bytes of code.
We can get rid of the stats analyser by moving all the stats code
to a stream interface applet. Above being cleaner, it provides new
advantages such as the ability to process requests and responses
from the same function and work only with simple state machines.
There's no need for any hijack hack anymore.
The direct advantage for the user are the interactive mode and the
ability to chain several commands delimited by a semi-colon. Now if
the user types "prompt", he gets a prompt from which he can send
as many requests as he wants. All outputs are terminated by a
blank line followed by a new prompt, so this can be used from
external tools too.
The code is not very clean, it needs some rework, but some part
of the dirty parts are due to the remnants of the hijack mode used
in the old functions we call.
The old AN_REQ_STATS_SOCK analyser flag is now unused and has been
removed.
It will soon be necessary to have stream interfaces running as part of
the current task, or as independant tasks. For instance when we want to
implement compression or SSL. It will also be used for applets running
as stream interfaces.
These new functions are used to perform exactly that. Note that it's
still not easy to write a simple echo applet and more functions will
likely be needed.
Those two functions did not correctly deal with full buffers and/or
buffers that wrapped around. Buffer_skip() was even able to incorrectly
set buf->w further than the end of buffer if its len argument was wrong,
and buffer_si_getline() was able to incorrectly return a length larger
than the effective buffer data available.
It's important that these functions set these flags themselves, otherwise
the callers will always have to do this, and there is no valid reason for
not doing it.
Collect information about last health check result,
including L7 code if possible (for example http or smtp
return code) and time took to finish last check.
Health check info is provided on both stats pages (html & csv)
and logged when a server is marked UP or DOWN. Currently active
check are marked with an asterisk, but only in html mode.
Currently there are 14 status codes:
UNK -> unknown
INI -> initializing
SOCKERR -> socket error
L4OK -> check passed on layer 4, no upper layers testing enabled
L4TOUT -> layer 1-4 timeout
L4CON -> layer 1-4 connection problem, for example "Connection refused"
(tcp rst) or "No route to host" (icmp)
L6OK -> check passed on layer 6
L6TOUT -> layer 6 (SSL) timeout
L6RSP -> layer 6 invalid response - protocol error
L7OK -> check passed on layer 7
L7OKC -> check conditionally passed on layer 7, for example
404 with disable-on-404
L7TOUT -> layer 7 (HTTP/SMTP) timeout
L7RSP -> layer 7 invalid response - protocol error
L7STS -> layer 7 response error, for example HTTP 5xx
In TCP, we don't want to forward chunks of data, we want to forward
indefinitely. This patch introduces a special value for the amount
of data to be forwarded. When buffer_forward() is called with
BUF_INFINITE_FORWARD, it configures the buffer to never stop
forwarding until the end.
The BF_EMPTY flag was once used to indicate an empty buffer. However,
it was used half the time as meaning the buffer is empty for the reader,
and half the time as meaning there is nothing left to send.
"nothing to send" is only indicated by "->send_max=0 && !pipe". Once
we fix this, we discover that the flag is not used anymore. So the
flags has been renamed BF_OUT_EMPTY and means exactly the condition
above, ie, there is nothing to send.
Doing so has allowed us to remove some unused tests for emptiness,
but also to uncover a certain amount of situations where the flag
was not correctly set or tested.
The BF_WRITE_ENA buffer flag became very complex to deal with, because
it was used to :
- enable automatic connection
- enable close forwarding
- enable data forwarding
The last point was not very true anymore since we introduced ->send_max,
but still the test remained everywhere. This was causing issues such as
impossibility to connect without forwarding data, impossibility to prevent
closing when data was forwarded, etc...
This patch clarifies the situation by getting rid of this multi-purpose
flag and replacing it with :
- data forwarding based only on ->send_max || ->pipe ;
- a new BF_AUTO_CONNECT flag to allow automatic connection and only
that ;
- ability to perform an automatic connection when ->send_max or ->pipe
indicate that data is waiting to leave the buffer ;
- a new BF_AUTO_CLOSE flag to let the producer automatically set the
BF_SHUTW_NOW flag when it gets a BF_SHUTR.
During this cleanup, it was discovered that some tests were performed
twice, or that the BF_HIJACK flag was still tested, which is not needed
anymore since ->send_max replcaed it. These places have been fixed too.
These cleanups have also revealed a few areas where the other flags
such as BF_EMPTY are not cleanly used. This will be an opportunity for
a second patch.
By inlining this function and slightly reordering it, we can double
the getchar/putchar test throughput, and reduce its footprint by about
40 bytes. Also, it was the only non-inlined char-based function, which
now makes it more consistent this time.
This function is used to cut the "tail" of a buffer, which means strip it
to the length of unsent data only, and kill any remaining unsent data. Any
scheduled forwarding is stopped. This is mainly to be used to send error
messages after existing data. It does the same as buffer_erase() for buffers
without pending outgoing data.
The computations in buffer_forward() were only valid if buffer_forward()
was used on a buffer which had no more data scheduled for forwarding.
This is always the case right now so this bug is not yet triggered but
it will soon be. Now we correctly discount the bytes to be forwarded
from the data already present in the buffer.
This function works like a traditional putchar() except that it
can return 0 if the output buffer is full.
Now a basic character-based echo function would look like this, from
a stream interface :
while (1) {
c = buffer_si_peekchar(req);
if (c < 0)
break;
if (!buffer_si_putchar(res, c)) {
si->flags |= SI_FL_WAIT_ROOM;
break;
}
buffer_skip(req, 1);
req->flags |= BF_WRITE_PARTIAL;
res->flags |= BF_READ_PARTIAL;
}
The buffer_si_peekline() function is sort of a fgets() to be used from a
stream interface. It returns a complete line whenever possible, and does
not update the buffer's pointer, so that the reader is free to consume
what it wants to.
buffer_si_peekchar() only returns one character, and also needs a call
to buffer_skip() once the character is definitely consumed.
This functions act like their buffer_write*() counter-parts,
except that they're specifically designed to be used from a
stream interface handler, as they carefully check size limits
and automatically advance the read pointer depending on the
to_forward attribute.
buffer_feed_chunk() is an inline calling buffer_feed() as both
are the sames. For this reason, buffer_write_chunk() has also
been turned into an inline which calls buffer_write().
buffer_contig_space(), buffer_contig_data() and buffer_skip()
provide easy methods to extract/insert data from/into a buffer.
buffer_write() and buffer_write_chunk() currently do not check
max_len nor to_forward, so they will quickly become embarrassing
to use or will need an equivalent. The reason is that they are
used to build error messages which currently are not subject to
analysis.
The first step towards dynamic buffer size consists in removing
all static definitions of the buffer size. Instead, we store a
buffer's size in itself. Right now they're all preinitialized
to BUFSIZE, but we will change that.
The remains of the stats socket code has nothing to do in proto_uxst
anymore and must move to dumpstats. The code is much cleaner and more
structured. It was also an opportunity to rename AN_REQ_UNIX_STATS
as AN_REQ_STATS_SOCK as the stats socket is no longer unix-specific
either.
The last item refering to stats in proto_uxst is the setting of the
task's nice value which should in fact come from the listener.
process_session() is now ready to handle unix stats sockets. This
first step works and old code has not been removed. A cleanup is
required. The stats handler is not unix socket-centric anymore and
should move to dumpstats.c.
The connection establishment was completely handled by backend.c which
normally just handles LB algos. Since it's purely TCP, it must move to
proto_tcp.c. Also, instead of calling it directly, we now call it via
the stream interface, which will later help us unify session handling.
The new statement "persist rdp-cookie" enables RDP cookie
persistence. The RDP cookie is then extracted from the RDP
protocol, and compared against available servers. If a server
matches the RDP cookie, then it gets the connection.
The RDP protocol is quite simple and documented, which permits
an easy detection and extraction of cookies. It can be useful
to match the MSTS cookie which can contain the username specified
by the client.
The HTTP processing has been splitted into 7 steps, one of which
is not anymore HTTP-specific (content-switching). That way, it
becomes possible to use "use_backend" rules in TCP mode. A new
"use_server" directive should follow soon.
Some stream analysers might become generic enough to be called
for several bits. So we cannot have the analyser bit hard coded
into the analyser itself. Let's make the caller inform the callee.
We want to split several steps in HTTP processing so that
we can call individual analysers depending on what processing
we want to perform. The first step consists in splitting the
part that waits for a request from the rest.
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.
The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.
This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
Some users want to keep the max sessions/s seen on servers, frontends
and backends for capacity planning. It's easy to grab it while the
session count is updated, so let's keep it.
These functions will be used to deliver asynchronous signals in order
to make the signal handling functions more robust. The goal is to keep
the same interface to signal handlers.