Till now we used the fact that the dispatch address was not null to use
the dispatch mode. This is very unconvenient, so let's have a dedicated
option.
When doing a connect() on a stream interface, some information is needed
from the server and from the backend. In some situations, we don't have
a server and only a backend (eg: peers). In other cases, we know we have
an applet and we don't want to connect to anything, but we'd still like
to have the info about the applet being used.
For this, we now store a pointer to the "target" into the stream interface.
The target describes what's on the other side before trying to connect. It
can be a server, a proxy or an applet for now. Later we'll probably have
descriptors for multiple-stage chains so that the final information may
still be found.
This will help removing many specific cases in the code. It already made
it possible to remove the "srv" and "be" parameters to tcpv4_connect_server().
Bryan Talbot reported that POST requests with a query string were not
correctly processed if the hash parameter was the first one, because
the delimiter that was looked for to trigger the parsing was '&' instead
of '?'.
Also, while checking the code, it became apparent that it was enough for
a query string to be present in the request for POST parameters to be
ignored, even if the url_param was in the body and not in the URL.
The code has then been fixed like this :
1) look for URL param. If found, return it.
2) if no URL param was found and method is POST, then look it up into
the body
The code now seems to pass all request combinations.
This patch must be backported to 1.4 since 1.4 is equally broken right now.
Till now, the forwarding code was making use of the hdr_content_len member
to hold the size of the last chunk parsed. As such, it was reset after being
scheduled for forwarding. The issue is that this entry was reset before the
data could be viewed by backend.c in order to parse a POST body, so the
"balance url_param check_post" did not work anymore.
In order to fix this, we need two things :
- the chunk size (reset upon every forward)
- the total body size (not reset)
hdr_content_len was thus replaced by the former (hence the size of the patch)
as it makes more sense to have it stored that way than the way around.
This patch should be backported to 1.4 with care, considering that it affects
the forwarding code.
When the number of servers is a multiple of the size of the input set,
map-based hash can be inefficient. This typically happens with 64
servers when doing URI hashing. The "avalanche" hash-type applies an
avalanche hash before performing a map lookup in order to smooth the
distribution. The result is slightly less smooth than the map for small
numbers of servers, but still better than the consistent hashing.
Till now when a server was configured with address 0.0.0.0, the
connection was forwarded to this address which generally is intercepted
by the system as a local address, so this was completely useless.
One sometimes useful feature for outgoing transparent proxies is to
be able to forward the connection to the same address the client
requested. This patch fixes the meaning of 0.0.0.0 precisely to
ensure that the connection will be forwarded to the initial client's
destination address.
It's not normal to initialize the server-side stream interface from the
accept() function, because it may change later. Thus, we introduce a new
stream_sock_prepare_interface() function which is called just before the
connect() and which sets all of the stream_interface's callbacks to the
default ones used for real sockets. The ->connect function is also set
at the same instant so that we can easily add new server-side protocols
soon.
The 'client.c' file now only contained frontend-specific functions,
so it has naturally be renamed 'frontend.c'. Same for client.h. This
has also been an opportunity to remove some cross references from
files that should not have depended on it.
In the end, this file should contain a protocol-agnostic accept()
code, which would initialize a session, task, etc... based on an
accept() from a lower layer. Right now there are still references
to TCP.
Some ACLs in the client ought to belong to proto_tcp, or protocols.
This file should only contain frontend-specific information and will
be renamed that way in next commit.
Some functions which act on generic buffer contents without being
tcp-specific were historically in proto_tcp.c. This concerns ACLs
and RDP cookies. Those have been moved away to more appropriate
locations. Ideally we should create some new files for each layer6
protocol parser. Let's do that later.
This ACL was missing in complex setups where the status of a remote site
has to be considered in switching decisions. Until there, using a server's
status in an ACL required to have a dedicated backend, which is a bit heavy
when multiple servers have to be monitored.
Using get_ip_from_hdr2() we can look for occurrence #X or #-X and
extract the IP it contains. This is typically designed for use with
the X-Forwarded-For header.
Using "usesrc hdr_ip(name,occ)", it becomes possible to use the IP address
found in <name>, and possibly specify occurrence number <occ>, as the
source to connect to a server. This is possible both in a server and in
a backend's source statement. This is typically used to use the source
IP previously set by a upstream proxy.
The transparent proxy address selection was set in the TCP connect function
which is not the most appropriate place since this function has limited
access to the amount of parameters which could produce a source address.
Instead, now we determine the source address in backend.c:connect_server(),
right after calling assign_server_address() and we assign this address in
the session and pass it to the TCP connect function. This cannot be performed
in assign_server_address() itself because in some cases (transparent mode,
dispatch mode or http_proxy mode), we assign the address somewhere else.
This change will open the ability to bind to addresses extracted from many
other criteria (eg: from a header).
Isidore Li reported an occasional segfault when using URL hashing, and
kindly provided backtraces and core files to help debugging.
The problem was triggered by reset connections before the URL was sent,
and was due to the same bug which was fixed by commit e45997661bf
(connections were attempted in case of connection abort). While that
bug was already fixed, it appeared that the same segfault could be
triggered when URL hashing is configured in an HTTP backend when the
frontend runs in TCP mode and no URL was seen. It is totally abnormal
to try to hash a null URL, as well as to process any kind of L7 hashing
when a full request was not seen.
This additional fix now ensures that layer7 hashing is not performed on
incomplete requests.
This is used to force access to down servers for some requests. This
is useful when validating that a change on a server correctly works
before enabling the server again.
Some message pointers were not usable once the message reached the
HTTP_MSG_DONE state. This is the case for ->som which points to the
body because it is needed to parse chunks. There is one case where
we need the beginning of the message : server redirect. We have to
call http_get_path() after the request has been parsed. So we rely
on ->sol without counting on ->som. In order to achieve this, we're
making ->rq.{u,v} relative to the beginning of the message instead
of the buffer. That simplifies the code and makes it cleaner.
Preliminary tests show this is OK.
Supported informations, available via "tr/td title":
- cap: capabilities (proxy)
- mode: one of tcp, http or health (proxy)
- id: SNMP ID (proxy, socket, server)
- IP (socket, server)
- cookie (backend, server)
The rq.u field is relative to buf->data, not to msg->sol. We have
to subtract msg->som everywhere this error was made. Maybe it will
be simpler to have a pointer to the buffer in the message and find
appropriate data there.
When parsing body for URL parameters, we must not consider that
data are available from buf->data but from buf->data + msg->som.
This is not a problem right now but may become with keep-alive.
Now that the HTTP analyser will already have parsed the beginning
of the request body, we don't have to check for transfer-encoding
anymore since we have the current chunk size in hdr_content_len.
These ACLs are used to check the number of active connections on the
frontend, backend or in a backend's queue. The avg_queue returns the
average number of queued connections per server, and for this, divides
the total number of queued connections by the number of alive servers.
The dst_conn ACL has been slightly changed to more reflect its name and
original usage, which is to return the number of connections on the
destination address/port (the socket) and not the whole frontend.
Consistent hashing provides some interesting advantages over common
hashing. It avoids full redistribution in case of a server failure,
or when expanding the farm. This has a cost however, the hashing is
far from being perfect, as we associate a server to a request by
searching the server with the closest key in a tree. Since servers
appear multiple times based on their weights, it is recommended to
use weights larger than approximately 10-20 in order to smoothen
the distribution a bit.
In some cases, playing with weights will be the only solution to
make a server appear more often and increase chances of being picked,
so stats are very important with consistent hashing.
In order to indicate the type of hashing, use :
hash-type map-based (default, old one)
hash-type consistent (new one)
Consistent hashing can make sense in a cache farm, in order not
to redistribute everyone when a cache changes state. It could also
probably be used for long sessions such as terminal sessions, though
that has not be attempted yet.
More details on this method of hashing here :
http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/
There are a few remaining max values that need to move to counters.
Also, the counters are more often used than some config information,
so get them closer to the other useful struct members for better cache
efficiency.
The "static-rr" is just the old round-robin algorithm. It is still
in use when a hash algorithm is used and the data to hash is not
present, but it was impossible to configure it explicitly. This one
is cheaper in terms of CPU and supports unlimited numbers of servers,
so it makes sense to be able to use it.
LB algo macros were composed of the LB algo by itself without any indication
of the method to use to look up a server (the lb function itself). This
method was implied by the LB algo, which was not very convenient to add
more algorithms. Now we have several fields in the LB macros, some to
describe what to look for in the requests, some to describe how to transform
that (kind of algo) and some to describe what lookup function to use.
The next patch will make it possible to factor out some code for all algos
which rely on a map.
We need to remove hash map accesses out of backend.c if we want to
later support new hash methods. This patch separates the hash computation
method from the server lookup. It leaves the lookup function to lb_map.c
and calls it with the result of the hash.
It was becoming painful to have all the LB algos in backend.c.
Let's move them to their own files. A few hashing functions still
need be broken in two parts, one for the contents and one for the
map position.
There is no reason to inline functions which are used to grab a server
depending on an LB algo. They are large and used at several places.
Uninlining them saves 400 bytes of code.
Please consider the following patches. They are required to
compile haproxy-1.4-dev2 on FreeBSD.
Summary:
1) include <sys/types.h> before <netinet/tcp.h>
2) Use IPPROTO_TCP instead of SOL_TCP
(they are both defined as 6, TCP protocol number)
The connection establishment was completely handled by backend.c which
normally just handles LB algos. Since it's purely TCP, it must move to
proto_tcp.c. Also, instead of calling it directly, we now call it via
the stream interface, which will later help us unify session handling.
Andrew Azarov reported that haproxy-1.4-dev1 does not build
under FreeBSD 7.2 because SOL_TCP is not defined. So add a
check for its definition before using it. This only impacts
network optimisations anyway.
This Linux-specific option was never really used in production and
has since been superseded by new splicing options brought by recent
Linux kernels.
It caused several particular cases in the code because the kernel
would take care of the session without haproxy being able to do
anything on it, which became hard to handle in the new architecture.
Let's simply get rid of it now that there is a replacement available.
This patch adds support for hashing RDP cookies in order to
use them as a load-balancing key. The new "rdp-cookie(name)"
load-balancing metric has to be used for this. It is still
mandatory to wait for an RDP cookie in the frontend, otherwise
it will randomly work.
The splice code did not consider compatibility between both ends
of the connection. Now we set different capabilities on each
stream interface, depending on what the protocol can splice to/from.
Right now, only TCP is supported. Thanks to this, we're now able to
automatically detect when splice() is not implemented and automatically
disable it on one end instead of reporting errors to the upper layer.
This new option enables combining of request buffer data with
the initial ACK of an outgoing TCP connection. Doing so saves
one packet per connection which is quite noticeable on workloads
mostly consisting in small objects. The option is not enabled by
default.
Setting TCP_CORK on a socket before sending the last segment enables
automatic merging of this segment with the FIN from the shutdown()
call. Playing with TCP_CORK is not easy though as we have to track
the status of the TCP_NODELAY flag since both are mutually exclusive.
Doing so saves one more packet per session and offers about 5% more
performance.
There is no reason not to do it, so there is no associated option.
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.
The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.
This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
There is a patch made by me that allow for balancing on any http header
field.
[WT:
made minor changes:
- turned 'balance header name' into 'balance hdr(name)' to match more
closely the ACL syntax for easier future convergence
- renamed the proxy structure fields header_* => hh_*
- made it possible to use the domain name reduction to any header, not
only "host" since it makes sense to do it with other ones.
Otherwise patch looks good.
/WT]
The connect timeout was not properly detected due to the fact that
it was not correctly initialized. It must be set as the stream interface
timeout, not the buffer's write timeout.