By default on a single process, we accept 100 connections at once. This is too
much on recent CPUs where the cache is constantly thrashing, because we visit
all those connections several times. We should batch the processing slightly
less so that all the accepted session may remain in cache during their initial
processing.
Lowering the batch size from 100 to 32 has changed the connection rate for
concurrencies between 5-10k from 67 kcps to 94 kcps on a Core i5 660 (4M L3),
and forward rates from 30k to 39.5k.
Tests on this hardware show that values between 10 and 30 seem to do the job fine.
When we fail to create a session because of memory shortage, let's at
least try to send a 500 message directly on the socket. Even if we don't
have any buffers left, the kernel's orphans management will take care of
delivering the message as long as there are socket buffers left.
Patch af5149 introduced an issue which can be detected only on out of
memory conditions : a LIST_DEL() may be performed on an uninitialized
struct member instead of a LIST_INIT() during the accept() phase,
causing crashes and memory corruption to occur.
This issue was detected and diagnosed by the Exceliance R&D team.
This is 1.5-specific and very recent, so no existing deployment should
be impacted.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
The motivation for this is that when soft-restart is merged
it will be come more important to free all relevant memory in deinit()
Discovered using valgrind.
This is used to perform cookie-based stickiness with table replication
between multiple masters and across restarts. This partially overrides
some of the appsession capabilities.
Never add connections allocated to this sever to a stick-table.
This may be used in conjunction with backup to ensure that
stick-table persistence is disabled for backup servers.
This pattern fetch function extracts the value of the rdp cookie <name> as
a string and uses this value to match. This enables implementation of
persistence based on the mstshash cookie. This is typically done if there
is no msts cookie present.
This differs from "balance rdp-cookie" in that any balancing algorithm may
be used and thus the distribution of clients to backend servers is not
linked to a hash of the RDP cookie. It is envisaged that using a balancing
algorithm such as "balance roundrobin" or "balance leastconnect" will lead
to a more even distribution of clients to backend servers than the hash
used by "balance rdp-cookie".
Example :
listen tse-farm
bind 0.0.0.0:3389
# wait up to 5s for an RDP cookie in the request
tcp-request inspect-delay 5s
tcp-request content accept if RDP_COOKIE
# apply RDP cookie persistence
persist rdp-cookie
# Persist based on the mstshash cookie
# This is only useful makes sense if
# balance rdp-cookie is not used
stick-table type string size 204800
stick on rdp_cookie(mstshash)
server srv1 1.1.1.1:3389
server srv1 1.1.1.2:3389
apsession_refresh() and apsess_refressh are only used inside apsession.c
and thus can be made static.
The only use of apsession_refresh() is appsession_task_init().
These functions have been re-ordered to avoid the need for
a forward-declaration of apsession_refresh().
If a connection is closed by because the backend became unavailable
then log 'D' as the termination condition.
Signed-off-by: Simon Horman <horms@verge.net.au>
This adds the "on-marked-down shutdown-sessions" statement on "server" lines,
which causes all sessions established on a server to be killed at once when
the server goes down. The task's priority is reniced to the highest value
(1024) so that servers holding many tasks don't cause a massive slowdown due
to the wakeup storm.
The motivation for this is to allow iteration of all the connections
of a server without the expense of iterating over the global list
of connections.
The first use of this will be to implement an option to close connections
associated with a server when is is marked as being down or in maintenance
mode.
* The declaration of peer_session_create() does
not match its definition. As it is only
used inside of peers.c make it static.
* Make the declaration of peers_register_table()
match its definition.
* Also, make all functions in peers.c that
are not also in peers.h static
gcc (Debian 4.6.0-2) 4.6.1 20110329 (prerelease)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
...
src/proto_http.c:3029:14: warning: variable ‘del_cl’ set but not used [-Wunused-but-set-variable]
In file included from ebtree/eb64tree.c:23:0:
ebtree/eb64tree.h: In function ‘__eb64_lookup’:
ebtree/eb64tree.h:128:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function ‘__eb64i_lookup’:
ebtree/eb64tree.h:180:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
from ebtree/ebimtree.c:23:
ebtree/eb64tree.h: In function ‘__eb64_lookup’:
ebtree/eb64tree.h:128:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function ‘__eb64i_lookup’:
ebtree/eb64tree.h:180:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
from ebtree/ebistree.h:25,
from ebtree/ebistree.c:23:
ebtree/eb64tree.h: In function ‘__eb64_lookup’:
ebtree/eb64tree.h:128:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function ‘__eb64i_lookup’:
ebtree/eb64tree.h:180:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
mysqld >= 5.5 want the client to announce 4.1+ authentication support, even if we have no password, so we do this.
I also check on a debian potato mysqld 3.22 and it works too so i assume we are good from 3.22 to 5.5.
[WT: this must be backported to 1.4]
The fullconn value is not easy to get right when doing dynamic regulation,
as it should depend on the maxconns of the frontends that can reach a
backend. Since the parameter is mandatory, many configs are found with
an inappropriate default value.
Instead of rejecting configs without a fullconn value, we now set it to
10% of the sum of the configured maxconns of all the frontends which are
susceptible to branch to the backend. That way if new frontends are added,
the backend's fullconn automatically adjusts itself.
Bashkim Kasa reported that the stats admin page did not work when colons
were used in server or backend names. This was caused by url-encoding
resulting in ':' being sent as '%3A'. Now we systematically decode the
field names and values to fix this issue.
Since version 1.0.0, it's forbidden to have a cookie specified without at
least one server. This test is useless and makes it complex to write APIs
to iteratively generate working configurations. Remove the test.
It's more expensive to call splice() on short payloads than to use
recv()+send(). One of the reasons is that doing a splice() involves
allocating a pipe. One other reason is that the kernel will have to
copy itself if we try to splice less than a page. So let's fix a
short offset of 4kB below which we don't splice.
A quick test shows that on chunked encoded data, with splice we had
6826 syscalls (1715 splice, 3461 recv, 1650 send) while with this
patch, the same transfer resulted in 5793 syscalls (3896 recv, 1897
send).
Fast-forwarding between file descriptors is nice but can be counter-productive
when only one part of the buffer is forwarded, because it can result in doubling
the number of send() syscalls. This is what happens on HTTP chunking, because
the chunk data are sent, then the CRLF + next chunk size are parsed and immediately
scheduled for forwarding. This results in two send() for the same block while a
single one would have done it.
Now that we support the http-no-delay mode, we can optimize HTTP
chunking again by always waiting for more data to come until the
last chunk is met.
This patch may or may not be backported to 1.4, it's not a big deal,
it will mainly help for chunks which are aligned with the buffer size.
There are some very rare server-to-server applications that abuse the HTTP
protocol and expect the payload phase to be highly interactive, with many
interleaved data chunks in both directions within a single request. This is
absolutely not supported by the HTTP specification and will not work across
most proxies or servers. When such applications attempt to do this through
haproxy, it works but they will experience high delays due to the network
optimizations which favor performance by instructing the system to wait for
enough data to be available in order to only send full packets. Typical
delays are around 200 ms per round trip. Note that this only happens with
abnormal uses. Normal uses such as CONNECT requests nor WebSockets are not
affected.
When "option http-no-delay" is present in either the frontend or the backend
used by a connection, all such optimizations will be disabled in order to
make the exchanges as fast as possible. Of course this offers no guarantee on
the functionality, as it may break at any other place. But if it works via
HAProxy, it will work as fast as possible. This option should never be used
by default, and should never be used at all unless such a buggy application
is discovered. The impact of using this option is an increase of bandwidth
usage and CPU usage, which may significantly lower performance in high
latency environments.
This change should be backported to 1.4 since the first report of such a
misuse was in 1.4. Next patch will also be needed.
The FL_TCP flag was a leftover from the old days we were using TCP_CORK.
With MSG_MORE it's not needed anymore so we can remove the condition and
sensibly simplify the test.
When sending is complete, it's preferred to systematically clear the flags
that were set for that transfer. What could happen is that the to_forward
counter had caused the MSG_MORE flag to be set and BF_EXPECT_MORE not to
be cleared, resulting in this flag being unexpectedly maintained for next
round.
The code has taken extreme care of not doing this till now, but it's not
acceptable that the caller has to know these precise semantics. So let's
unconditionnally clear the flag instead.
For the sake of safety, this fix should be backported to 1.4.
Commit 57f5c1 used to provide a nice improvement on chunked encoding since
it ensured that we did not set a PUSH flag for every chunk or buffer data
part of a chunked transfer.
Some applications appear to erroneously abuse HTTP chunking in order to
get interactive exchanges between a user agent and an origin server with
very small chunks. While it happens to work through haproxy, it's terribly
slow due to the latency added after passing each chunk to the system, who
could wait up to 200ms before pushing them onto the wire.
So we need an interactive mode for such usages. In the mean time, step back
on the optim, but not completely, so that we still keep the flag as long as
we know we're not finished with the current chunk.
This change should be backported to 1.4 too as the issue was discovered
with it.
This status code is used in response to requests matching "monitor-uri".
Some users need to adjust it to fit their needs (eg: make some strings
appear there). As it's already defined as a chunked string and used
exactly like other status codes, it makes sense to make it configurable
with the usual "errorfile", "errorloc", ...
Some people like to make the monitoring URL testable from unsafe locations.
Reporting haproxy's existence there can sometimes be problematic. This patch
should not be backported to 1.4 because it is possible, eventhough unlikely,
that some scripts rely on this word to appear there.
As reported by Lauri-Alo Adamson, version 1.5-dev6 doesn't support
stick-tables with a binary type.
This issue was introduced in the commit 4f92d32 where a line was erroneously
deleted, and is 1.5-specific.
Mark Brooks reported that commit 1b4b7c broke tproxy in 1.5-dev6. Nick
Chalk tracked the issue down to a missing address family setting in
tcp_bind_socket() which resulted in a failure to use get_addr_len().
This issue is 1.5-specific.
Christopher Blencowe reported that the httpchk_expect() function was
lacking a test for incomplete responses : if the server sends only the
headers in the first packet and the body in a subsequent one, there is
a risk that the check fails without waiting for more data. A failure
rate of about 1% was reported.
This fix must be backported to 1.4.
When doing fix 24581bae02 to correctly handle
response cookies, an unfortunate typo was inserted in the less likely code
path, resulting in a risk of crash when cookie-based persistence is enabled
and the server emits a cookie with several spaces around the equal sign.
This bug was noticed during a code backport. Its effects were never reported
because this situation is very unlikely to appear, but it can be provoked on
purpose by the server.
This patch must be backported to 1.4 versions which contain the fix above
(anything > 1.4.8), and to similar 1.3 versions > 1.3.25. 1.5-dev versions
after 1.5-dev2 are affected too.
John Helliwell reported a runtime issue on Solaris since 1.5-dev5. Traces
show that connect() returns EINVAL, which means the socket length is not
appropriate for the family. Solaris does not like being called with sizeof
and needs the address family's size on sockaddr_storage.
The fix consists in adding a get_addr_len() function which returns the
socket's address length based on its family. Tests show that this works
for both IPv4 and IPv6 addresses.
Since IPv6 is a different type than IPv4, the pattern fetch functions
src6 and dst6 were added. IPv6 stick-tables can also fetch IPv4 addresses
with src and dst. In this case, the IPv4 addresses are mapped to their
IPv6 counterpart, according to RFC 4291.
Patch 5ab04ec47c was incomplete,
because if the first send() fails on an empty buffer, we fail
to rearm the polling and we can't establish the connection
anymore.
The issue was reported by Ben Timby who provided large amounts
of traces of various tests helping to reliably reproduce the issue.
Since the latest additions to buffer_forward(), it became too large for
inlining, so let's uninline it. The code size drops by 3kB. Should be
backported to 1.4 too.
Despite much care around handling the content-length as a 64-bit integer,
forwarding was broken on 32-bit platforms due to the 32-bit nature of
the ->to_forward member of the "buffer" struct. The issue is that this
member is declared as a long, so while it works OK on 64-bit platforms,
32-bit truncate the content-length to the lower 32-bits.
One solution could consist in turning to_forward to a long long, but it
is used a lot in the critical path, so it's not acceptable to perform
all buffer size computations on 64-bit there.
The fix consists in changing the to_forward member to a strict 32-bit
integer and ensure in buffer_forward() that only the amount of bytes
that can fit into it is considered. Callers of buffer_forward() are
responsible for checking that their data were taken into account. We
arbitrarily ensure we never consider more than 2G at once.
That's the way it was intended to work on 32-bit platforms except that
it did not.
This issue was tracked down hard at Exosec with Bertrand Jacquin,
Thierry Fournier and Julien Thomas. It remained undetected for a long
time because files larger than 4G are almost always transferred in
chunked-encoded format, and most platforms dealing with huge contents
these days run on 64-bit.
The bug affects all 1.5 and 1.4 versions, and must be backported.
Since we now have the copy of the target in the session, use it instead
of relying on the SI for it. The SI drops the target upon unregister()
so applets such as stats were logged as "NOSRV".
Johannes Smith reported some wrong retries count in logs associated with bad
requests. The cause was that the conn_retries field in the stream interface
was only initialized when attempting to connect, but is used when logging,
possibly with an uninitialized value holding last connection's conn_retries.
This could have been avoided by making use of a stream interface initializer.
This bug is 1.5-specific.
Function gethostbyname is deprecated since IEEE Std 1003.1-2008 and
was replaced by getaddrinfo (available since IEEE Std 1003.1-2004).
Contrary to gethostbyname, getaddrinfo is specified to support both
IPv4 and IPv4 addresses.
Since some libc doesn't handle getaddrinfo properly, constant
USE_GETADDRINFO must be defined at compile time to enable use of
getaddrinfo.
It's always been a mess to debug wrong listening addresses because
the parsing function does not indicate the file and line number. Now
it does. Since the code was almost a duplicate of str2sa_range, it
now makes use of it and has been sensibly reduced.
The parser now distinguishes between pure addresses and address:port. This
is useful for some config items where only an address is required.
Raw IPv6 addresses are now parsed, but IPv6 host name resolution is still not
handled (gethostbyname does not resolve IPv6 names to addresses).
This option enables use of the PROXY protocol with the server, which
allows haproxy to transport original client's address across multiple
architecture layers.
Upon connection establishment, stream_sock is now able to send a PROXY
line before sending any data. Since it's possible that the buffer is
already full, and we don't want to allocate a block for that line, we
compute it on-the-fly when we need it. We just store the offset from
which to (re-)send from the end of the line, since it's assumed that
multiple outputs of the same proxy line will be strictly equivalent. In
practice, one call is enough. We just make sure to handle the case where
the first send() would indicate an incomplete output, eventhough it's
very unlikely to ever happen.
If the check fails for a low-level socket error (eg: address family not
supportd), we currently ignore the status. We must report the error and
declare a failed health check in this case. The only real reason for this
would be when an IPv6 check is required on an IPv4-only system.
And also rename "req_acl_rule" "http_req_rule". At the beginning that
was a bit confusing to me, especially the "req_acl" list which in fact
holds what we call rules. After some digging, it appeared that some
part of the code is 100% HTTP and not just related to authentication
anymore, so let's move that part to HTTP and keep the auth-only code
in auth.c.
Right now, http-request rules are not evaluated if the URL matches the
stats request. This is quite unexpected. For instance, in the config
below, an abuser present in the abusers list will not be prevented access
to the stats.
listen pub
bind :8181
acl abuser src -f abusers.lst
http-request deny if abuser
stats uri /stats
It is not a big deal but it's not documented as such either. For 1.5, let's
have both lists be evaluated in turn, until one blocks. For 1.4 we'll simply
update the doc to indicate that.
Also instead of duplicating the code, the patch factors out the list walking
code. The HTTP auth has been moved slightly earlier, because it was set after
the header addition code, but we don't need to add headers to a request we're
dropping.
It's very annoying that frontend and backend stats are merged because we
don't know what we're observing. For instance, if a "listen" instance
makes use of a distinct backend, it's impossible to know what the bytes_out
means.
Some points take care of not updating counters twice if the backend points
to the frontend, indicating a "listen" instance. The thing becomes more
complex when we try to add support for server side keep-alive, because we
have to maintain a pointer to the backend used for last request, and to
update its stats. But we can't perform such comparisons anymore because
the counters will not match anymore.
So in order to get rid of this situation, let's have both frontend AND
backend stats in the "struct proxy". We simply update the relevant ones
during activity. Some of them are only accounted for in the backend,
while others are just for frontend. Maybe we can improve a bit on that
later, but the essential part is that those counters now reflect what
they really mean.
This patch turns internal server addresses to sockaddr_storage to
store IPv6 addresses, and makes the connect() function use it. This
code already works but some caveats with getaddrinfo/gethostbyname
still need to be sorted out while the changes had to be merged at
this stage of internal architecture changes. So for now the config
parser will not emit an IPv6 address yet so that user experience
remains unchanged.
This change should have absolutely zero user-visible effect, otherwise
it's a bug introduced during the merge, that should be reported ASAP.
This one has been removed and is now totally superseded by ->target.
To get the server, one must use target_srv(&s->target) instead of
s->srv now.
The function ensures that non-server targets still return NULL.
s->prev_srv is used by assign_server() only, but all code paths leading
to it now take s->prev_srv from the existing s->srv. So assign_server()
can do that copy into its own stack.
If at one point a different srv is needed, we still have a copy of the
last server on which we failed a connection attempt in s->target.
When dealing with HTTP keep-alive, we'll have to know if we can reuse
an existing connection. For that, we'll have to check if the current
connection was made on the exact same target (referenced in the stream
interface).
Thus, we need to first assign the next target to the session, then
copy it to the stream interface upon connect(). Later we'll check for
equivalence between those two operations.
Since all of them are defined as proxy options, it's better to ensure
that at most one of them is enabled at once. The priority has been set
according to what is already performed in the backend :
1) dispatch
2) http_proxy
3) transparent
Till now we used the fact that the dispatch address was not null to use
the dispatch mode. This is very unconvenient, so let's have a dedicated
option.
This is in fact where those parts belong to. The old data_state was replaced
by applet.state and is now initialized when the applet is registered. It's
worth noting that the applet does not need to know the session nor the
buffer anymore since everything is brought by the stream interface.
It is possible that having a separate applet struct would simplify the
code but that's not a big deal.
With HTTP keep-alive, logging the right server name will be quite
complex because the assigned server will possibly change before we log.
Also, when we want to log accesses to an applet, it's not easy because
the applet becomes NULL again before logging.
The logged server's name is now taken from the target stored in the
stream interface. That way we can log an applet, a server name, or we
could even log a proxy or anything else if we wanted to. Ideally the
session should contain a desired target which is the one which should
be logged.
Now that we have the target pointer and type in the stream interface,
we don't need the applet.handler pointer anymore. That makes the code
somewhat cleaner because we know we're dealing with an applet by checking
its type instead of checking the pointer is not null.
When doing a connect() on a stream interface, some information is needed
from the server and from the backend. In some situations, we don't have
a server and only a backend (eg: peers). In other cases, we know we have
an applet and we don't want to connect to anything, but we'd still like
to have the info about the applet being used.
For this, we now store a pointer to the "target" into the stream interface.
The target describes what's on the other side before trying to connect. It
can be a server, a proxy or an applet for now. Later we'll probably have
descriptors for multiple-stage chains so that the final information may
still be found.
This will help removing many specific cases in the code. It already made
it possible to remove the "srv" and "be" parameters to tcpv4_connect_server().
I/O handlers are still delicate to manipulate. They have no type, they're
just raw functions which have no knowledge of themselves. Let's have them
declared as applets once for all. That way we can have multiple applets
share the same handler functions and we can store their names there. When
we later need to add more parameters (eg: usage stats), we'll be able to
do so in the applets themselves.
The CLI functions has been prefixed with "cli" instead of "stats" as it's
clearly what is going on there.
The applet descriptor in the stream interface should get all the applet
specific data (st0, ...) but this will be done in the next patch so that
we don't pollute this one too much.
Both Hank A. Paulson and Rob at pixsense reported a crash when
loading ACLs from a pattern file which contains empty lines.
From the tests, it appears that only files that contain nothing
but empty lines are causing that (in the past they would have had
their line feeds loaded as patterns).
The crash happens in the free_pattern() call which doesn't like to
be called with a NULL pattern. Let's make it accept it so that it's
more in line with the standard uses of free() which ignores NULLs.
Similar to the stats socket bug, we must check that the proxy is not disabled
before trying to enable/disable a server.
Even if a disabled proxy is not displayed, someone can inject a faulty proxy
name in the POST parameters. So, we must ensure that no disabled proxy can be
used.
As reported by Bryan Talbot, enabling and disabling a server in a disabled
proxy causes a segfault.
Changing the weight can also cause a similar segfault.
Bryan Talbot reported that POST requests with a query string were not
correctly processed if the hash parameter was the first one, because
the delimiter that was looked for to trigger the parsing was '&' instead
of '?'.
Also, while checking the code, it became apparent that it was enough for
a query string to be present in the request for POST parameters to be
ignored, even if the url_param was in the body and not in the URL.
The code has then been fixed like this :
1) look for URL param. If found, return it.
2) if no URL param was found and method is POST, then look it up into
the body
The code now seems to pass all request combinations.
This patch must be backported to 1.4 since 1.4 is equally broken right now.
Till now, the forwarding code was making use of the hdr_content_len member
to hold the size of the last chunk parsed. As such, it was reset after being
scheduled for forwarding. The issue is that this entry was reset before the
data could be viewed by backend.c in order to parse a POST body, so the
"balance url_param check_post" did not work anymore.
In order to fix this, we need two things :
- the chunk size (reset upon every forward)
- the total body size (not reset)
hdr_content_len was thus replaced by the former (hence the size of the patch)
as it makes more sense to have it stored that way than the way around.
This patch should be backported to 1.4 with care, considering that it affects
the forwarding code.
It seems like if a response message is chunked and the chunk size wraps
at the end of the buffer and the crlf sequence is incomplete, then we
can forward a wrong chunk size due to incorrect handling of the wrapped
size. It seems extremely unlikely to occur on real traffic (no reason to
have half of the CRLF after a chunk) but nothing prevents it from being
possible.
This fix must be backported to 1.4.
As reported by the Loadbalancer.org team, it was not possible to bind
more than 1024 ports. This is because the process' limits were set after
trying to bind the sockets, which defeats their purpose.
This fix must be backported to 1.4 and 1.3.
We used to only count one socket instead of one per listener. This makes
the socket count wrong, preventing from automatically computing the proper
number of sockets to bind.
This fix must be backported to 1.4 and 1.3.
req_acl was used instead of req_acl_final. As a matter of luck, both
happen to be the same at this point, but this is not granted in the
future.
This fix should be backported to 1.4.
Some browsers send POST requests in several packets, which was not supported
by the "stats admin" function.
This patch allows to wait for more data when they are not fully received
(we are still limited to a certain size defined by the buffer size minus its
reserved space).
It also adds support for the "Expect: 100-Continue" header.
Stefan Behte reported a strange case where depending on the position of
the Connection header in the header list, some headers added after it
were or were not usable in "balance hdr()". The reason is that when the
last header is removed, the list's tail was not updated, so any header
added after that one was not visible from the list.
This fix must be backported to 1.4 and possibly 1.3.
while working further on the changes to allow for dynamic
adding/removing of backend servers we noticed a potential problem: the
path given for the 'stats socket' global option may get truncated when
copying it into the sockaddr_un.sun_path field.
Attached patch checks the length, and reports an error if truncation
would happen.
This issue was noticed by Joerg Sonnenberger <joerg@NetBSD.org>.
src/frontend.c: In function 'frontend_accept':
src/frontend.c:110: warning: pointer targets in passing argument 5 of 'getsockopt' differ in signedness
The argument should be socklen_t and not int.
I have written a small patch to enable a correct PostgreSQL health check
It works similar to mysql-check with the very same parameters.
E.g.:
listen pgsql 127.0.0.1:5432
mode tcp
option pgsql-check user pgsql
server masterdb pgsql.server.com:5432 check inter 10000
It's better to avoid sticking on empty parameter values, as this almost
always indicates a missing parameter. Otherwise it's easy to enter a
situation where all new visitors stick to the same server.
Revert commits 035da6d1b0 and
f18b5f21ba.
These fixes were wrong. They worked but they were fixing the symptom
instead of the root cause of the problem. The real issue was in the
ebtree lookup code and it has been fixed now so these patches are not
needed anymore. It's better not to copy memory blocks when we don't
need to, so let's revert them.
Commit 035da6d1b0 was incorrect as it
could modify a live buffer. We must first ensure that we're on the
private buffer or perform a copy before modifying the data.
Gabriel Sosa reported that haproxy unexpectedly reports an error
when a pattern file loaded by an ACL contains an empty line. The
test was present but inefficient as it did not consider the '\n'
as the end of the line. This fix relies on the line length instead.
It should be backported to 1.4.
If a key to be looked up is extracted from data without being padded
and if it matches the beginning of another stored key, it is not
found in subsequent lookups because it does not end with a zero.
This bug was discovered and diagnosed by David Cournapeau.
One of the requirements we have is to run multiple instances of haproxy on a
single host; this is so that we can split the responsibilities (and change
permissions) between product teams. An issue we ran up against is how we
would distinguish between the logs generated by each instance. The solution
we came up with (please let me know if there is a better way) is to override
the application tag written to syslog. We can then configure syslog to write
these to different files.
I have attached a patch adding a global option 'log-tag' to override the
default syslog tag 'haproxy' (actually defaults to argv[0]).
By passing a negative value to the "mss" argument of "bind" lines, it
becomes possible to subtract this value to the MSS advertised by the
client, which results in segments smaller than advertised. The effect
is useful with some TCP stacks which ACK less often when segments are
not full, because they only ACK every other full segment as suggested
by RFC1122.
NOTE: currently this has no effect on Linux kernel 2.6, a kernel patch
is still required to change the MSS of established connections.
Haproxy does not include the hostname rather the IP of the machine in
the syslog headers it sends. Unfortunately this means that for each log
line rsyslog does a reverse dns on the client IP and in the case of
non-routable IPs one gets the public hostname not the internal one.
While this is valid according to RFC3164 as one might imagine this is
troublsome if you have some machines with public IPs, internal IPs, no
reverse DNS entries, etc and you want a standardized hostname based log
directory structure. The rfc says the preferred value is the hostname.
This patch adds a global "log-send-hostname" statement which accepts an
optional string to force the host name. If unset, the local host name
is used.
Since haproxy 1.4.9, combining option httpclose and option
http-pretend-keepalive can leave the connections opened until the backend
keep-alive timeout is reached, providing bad performances.
The same can occur when the proxy is in tunnel mode.
This patch ensures that the server side connection is closed after the
response and ignore http-pretend-keepalive in tunnel mode.
When a connection error is encountered on a server and the server's
connection pool is full, pending connections are not woken up because
the current connection is still accounted for on the server, so it
still appears full. This becomes visible on a server which has
"maxconn 1" because the pending connections will only be able to
expire in the queue.
Now we take care of releasing our current connection before trying to
offer it to another pending request, so that the server can accept a
next connection.
This patch should be backported to 1.4.
When a client connection aborts while the server-side connection is in
turn-around after a failed connection attempt, the turn-around timeout
is reset in shutw() but the state is not changed. The session then
remains stuck in this state forever. Change the QUE and TAR states to
DIS just as we do for CER to fix this.
This patch should be backported to 1.4.
We've had several issues related to data transfers. First, if a
client aborted an upload before the server started to respond, it
would get a 502 followed by a 400. The same was true (in the other
way around) if the server suddenly aborted while the client was
uploading the data.
The flags reported in the logs were misleading. Request errors could
be reported while the transfer was stopped during the data phase. The
status codes could also be overwritten by a 400 eventhough the start
of the response was transferred to the client.
The stats were also wrong in case of data aborts. The server or the
client could sometimes be miscredited for being the author of the
abort depending on where the abort was detected. Some client aborts
could also be accounted as request errors and some server aborts as
response errors.
Now it seems like all such issues are fixed. Since we don't have a
specific state for data flowing from the client to the server
before the server responds, we're still counting the client aborted
transfers as "CH", and they become "CD" when the server starts to
respond. Ideally a "P" state would be desired.
This patch should be backported to 1.4.
HTTP pipelining currently needs to monitor the response buffer to wait
for some free space to be able to send a response. It was not possible
for the HTTP analyser to be called based on response buffer activity.
Now we introduce a new buffer flag BF_WAKE_ONCE which is set when the
HTTP request analyser is set on the response buffer and some activity
is detected. This is not clean at all but once of the only ways to fix
the issue before we make it possible to register events for analysers.
Also it appeared that one realign condition did not cover all cases.
Using haproxy in multi-process mode (nbproc > 1), some features can be
not fully compatible or not work at all. haproxy will now display a warning on
startup for :
- appsession
- sticking rules
- stats / stats admin
- stats socket
- peers (fatal error in that case)
This counter will help quickly spot whether there are new errors or not.
It is also assigned to each capture so that a script can keep trace of
which capture was taken when.
It is possible to block on incorrectly chunked requests or responses,
but this becomes very hard to debug when it happens once in a while.
This patch adds the ability to also capture incorrectly chunked requests
and responses. The chunk will appear in the error buffer and will be
verifiable with the usual "show errors". The incorrect byte will match
the error location.
Error captures did only support contiguous messages. This is annoying
for capturing chunking errors, so let's ensure the function is able to
copy wrapped messages.
When an error message is returned to a client, all buffer contents
were left intact. Since the analysers were removed, the potentially
invalid data that were read had a chance to be sent too.
Now we ensure we only keep the already scheduled data in the buffer
and we truncate it after that. That means that responses with data
that must be blocked will really be blocked, and that incorrectly
chunked data will be stopped at the point where the chunking fails.
When haproxy parses chunk-encoded data that are scheduled to be sent, it is
possible that the other end is closed (mainly due to a client abort returning
as an error). The message state thus changes to HTTP_MSG_ERROR and the error
is reported as a chunk parsing error ("PD--") while it is not. Detect this
case before setting the flags and set the appropriate flag in this case.
Debugging parsing errors can be greatly improved if we know what the parser
state was and what the buffer flags were (especially for closed inputs/outputs
and full buffers). Let's add that to the error snapshots.
When forwarding chunk-encoded data, each chunk gets a TCP PUSH flag when
going onto the wire simply because the send() function does not know that
some data remain after it (next chunk). Now we set the BF_EXPECT_MORE flag
on the buffer if the chunk size is not null. That way we can reduce the
number of packets sent, which is particularly noticeable when forwarding
compressed data, especially as it requires less ACKs from the client.
When the number of servers is a multiple of the size of the input set,
map-based hash can be inefficient. This typically happens with 64
servers when doing URI hashing. The "avalanche" hash-type applies an
avalanche hash before performing a map lookup in order to smooth the
distribution. The result is slightly less smooth than the map for small
numbers of servers, but still better than the consistent hashing.
We'll use this hash at other places, let's make it globally available.
The function has also been renamed because its "chash_hash" name was
not appropriate.
When a header is removed, the previous header's next pointer is updated
to reflect the next of the current header. However, when cycling through
the loop, we update the prev pointer to point to the deleted header, which
means that if we delete another header, it's the deleted header's next
pointer that will be updated, leaving the deleted header in the list with
a null length, which is forbidden.
We must just not update the prev pointer after a removal.
This bug was present when either "reqdel" and "rspdel" removed two consecutive
headers. It could also occur when removing cookies in either requests or
responses, but since headers were the last header processing, the issue
remained unnoticed.
Issue reported by Hank A. Paulson.
This fix must be ported to 1.4 and possibly 1.3.
Cookies in indirect mode are removed from the cookie header. Three pointers
ought to be updated when appsession cookies are processed next, but were not.
The result is that a memcpy() can be called with a negative value causing the
process to crash. It is not sure whether this can be remotely exploited or not.
(cherry picked from commit c5f3749aa3ccfdebc4992854ea79823d26f66213)
In out of memory conditions, the ->destroy function would free all
possibly allocated pools from the current appsession, including those
that were not yet allocated nor assigned, which used to point to a
previous allocation, obviously resulting in a segfault.
(cherry picked from commit 75eae485921d3a6ce197915c769673834ecbfa5c)
In case of out of memory, it was possible to write to a null pointer
when capturing response cookies due to a missing "else" block. The
request handling was fine though.
(cherry picked from commit 62e3604d7dd27741c0b4c9e27d9e7c73495dfc32)
When running with -vv or -V -d, the list of usable polling systems
is reported. The final selection did not take into account the
possible failures during the tests, which is misleading and could
make one think that a non-working poller will be used, while it is
not the case. Fix that to really report the correct ones.
(cherry picked from commit 6d0e354e0171f08b7b3868ad2882c3663bd068a7)
Since unix sockets are supported for bind, the default backlog size was not
enough to accept the traffic. The size is now inherited from the listener
to behave like the tcp listeners.
This also affects the "stats socket" backlog, which is now determined by
"stats maxconn".
Some distros' libc are built for CPUs earlier than i686 and as such do
not offer support for Linux kernel's faster vsyscalls. This code adds
a new build option USE_VSYSCALLS to bypass libc for most commonly used
system calls. A net gain of about 10% can be observed with this change
alone.
It only works when /proc/sys/abi/vsyscall32 equals exactly 2. When it's
set to 1, the VDSO is randomized and cannot be used.
Analysers were re-evaluated when some flags were still present in the
buffers, even if they had not changed since previous pass, resulting
in a waste of CPU cycles.
Ensuring that the flags have changed has saved some useless calls :
function min calls per session (before -> after)
http_request_forward_body 5 -> 4
http_response_forward_body 3 -> 2
http_sync_req_state 10 -> 8
http_sync_res_state 8 -> 6
http_resync_states 8 -> 6
The stream_sock's accept() used to close the FD upon error, but this
was also sometimes performed by the frontend's accept() called via the
session's accept(). Those interlaced calls were also responsible for the
spaghetti-looking error unrolling code in session.c and stream_sock.c.
Now the frontend must not close the FD anymore, the session is responsible
for that. It also takes care of just closing the FD or also removing from
the FD lists, depending on its state. The socket-level accept() does not
have to care about that anymore.
Some Alert() messages were remaining in the accept() path, which they
would have no chance to be detected. Remove some of them (the impossible
ones) and replace the relevant ones with send_log() so that the admin
has a chance to catch them.