commit a1cc3811 introduced an undesirable \0\n ending on HTTP log messages. This
is because of an extra character count passed to __send_log() which causes the LF
to be appended past the \0. Some syslog daemons thus log an extra empty line. The
fix is obvious. Fix the function comments to remind what they expect on their input.
This is past 1.5-dev7 regression so there's no backport needed.
http_sess_log now use the logformat linked list to make the log
string, snprintf is not used for speed issue.
CLF mode also uses logformat.
NOTE: as of now, empty fields in CLF now are "" not "-" anymore.
Marcello Gorlani reported that commit 5e205524ad24003ecc4dbb435066aebe7ed58d95
(BUG: http: re-enable TCP quick-ack upon incomplete HTTP requests) broke build
on FreeBSD.
Moving the include lower fixes the issue. This must be backported to 1.4 too.
These ones are invalid and blocked unless "option accept-invalid-http-request"
is specified in the frontend. In any case, the faulty request is logged.
Note that some of the remaining invalid chars are still not checked against,
those are the invalid ones between 32 and 127 :
34 ('"'), 60 ('<'), 62 ('>'), 92 ('\'), 94 ('^'),
96 ('`'), 123 ('{'), 124 ('|'), 125 ('}')
Using a lookup table might be better at some point.
The HTTP request parser was considering that any non-LWS char was
par of the URI. Unfortunately, this allows control chars to be sent
in the URI, sometimes resulting in backend servers misbehaving, for
instance when they interprete \0 as an end of string and respond
with plain HTTP/0.9 without headers, that haproxy blocks as invalid
responses.
RFC3986 clearly states the list of allowed characters in a URI. Even
non-ASCII chars are not allowed. Unfortunately, after having run 10
years with these chars allowed, we can't block them right now without
an optional workaround. So the first step consists in only blocking
control chars. A later patch will allow non-ASCII only when an appropriate
option is enabled in the frontend.
Control chars are 0..31 and 127, with the exception of 9, 10 and 13
(\t, \n, \r).
New option "http-send-name-header" specifies the name of a header which
will hold the server name in outgoing requests. This is the name of the
server the connection is really sent to, which means that upon redispatches,
the header's value is updated so that it always matches the server's name.
This pattern previously was limited to type IP. With the new header
extraction function, it becomes possible to extract strings, so that
the header can be returned as a string. This will not change anything
to existing configs, as string will automatically be converted to IP
when needed. However, new configs will be able to use IPv6 addresses
from headers in stick-tables, as well as stick on any non-IP header
(eg: host, user-agent, ...).
The new function does not return IP addresses but header values instead,
so that the caller is free to make what it want of them. The conversion
is not quite clean yet, as the previous test which considered that address
0.0.0.0 meant "no address" is still used. A different IP parsing function
should be used to take this into account.
Now strings and data blocks are stored in the temp_pattern's chunk
and matched against this one.
The rdp_cookie currently makes extensive use of acl_fetch_rdp_cookie()
and will be a good candidate for the initial rework so that ACLs use
the patterns framework and not the other way around.
IPv4 and IPv6 addresses are now stored into temp_pattern instead of
the dirty hack consisting into storing them into the consumer's target
address.
Some refactoring should now be possible since the methods used to fetch
source and destination addresses are similar between patterns and ACLs.
All ACL fetches which return integer value now store the result into
the temporary pattern struct. All ACL matches which rely on integer
also get their value there.
Note: the pattern data types are not set right now.
By default we disable TCP quick-acking on HTTP requests so that we
avoid sending a pure ACK immediately followed by the HTTP response.
However, if the client sends an incomplete request in a short packet,
its TCP stack might wait for this packet to be ACKed before sending
the rest of the request, delaying incoming requests by up to 40-200ms.
We can detect this undesirable situation when parsing the request :
- if an incomplete request is received
- if a full request is received and uses chunked encoding or advertises
a content-length larger than the data available in the buffer
In these situations, we re-enable TCP quick-ack if we had previously
disabled it.
This patch settles the 2 loggers limitation.
Loggers are now stored in linked lists.
Using "global log", the global loggers list content is added at the end
of the current proxy list. Each "log" entries are added at the end of
the proxy list.
"no log" flush a logger list.
Stream interfaces used to distinguish between client and server addresses
because they were previously of different types (sockaddr_storage for the
client, sockaddr_in for the server). This is not the case anymore, and this
distinction is confusing at best and has caused a number of regressions to
be introduced in the process of converting everything to full-ipv6. We can
now remove this and have a much cleaner code.
This patch introduces hdr_len, path_len and url_len for matching these
respective parts lengths against integers. This can be used to detect
abuse or empty headers.
Commit 588bd4 fixed header parsing so that trailing spaces were not part
of the returned string. Unfortunately, if a header only had spaces, the
last spaces were trimmed past the beginning of the value, causing a negative
length to be returned.
A quick code review shows that there should be no impact since the only
places where the vlen is used are either compared to a specific value or
with explicit contents (eg: digits).
This must be backported to 1.4.
These requests are mainly monitor requests, as well as stats requests when
the stats are processed by the frontend. Having this counter helps explain
the difference in number of sessions that is sometimes observed between a
frontend and a backend.
Sometimes a bad content-length header is encountered and this causes
an abort. It's hard to debug without a trace, so let's take a capture
of the contents when this happens.
If a server starts to respond but stops before the body, then we
capture the truncated response. We don't do this on the request
because it would happen too often upon stupid attacks.
Trailing spaces after headers were not trimmed, only the leading ones
were. An issue was detected today with a content-length value which
was padded with spaces and which was rejected. Recent updates to the
http-bis draft made it a lot more clear that such spaces must be ignored,
so this is what this patch does.
It should be backported to 1.4.
Many inet_ntop calls were partially right, which was hard to detect given
the complex combinations. Some of them were relying on the listener's proto
instead of the address itself, which could have been different when dealing
with an accept-proxy connection.
The new addr_to_str() function does the dirty job and returns the family, which
makes it particularly suited to calls from switch/case statements. A large number
of if/else statements were removed and the stats output could even be cleaned up
in the case of session dump.
As a side effect of doing this, the resulting code is smaller by almost 1kB.
All changed parts have been tested and provided expected output.
If "option forwardfor" has the "if-none" argument, then the header is
only added when the request did not already have one. This option has
security implications, and should not be set blindly.
This is used to perform cookie-based stickiness with table replication
between multiple masters and across restarts. This partially overrides
some of the appsession capabilities.
The motivation for this is to allow iteration of all the connections
of a server without the expense of iterating over the global list
of connections.
The first use of this will be to implement an option to close connections
associated with a server when is is marked as being down or in maintenance
mode.
gcc (Debian 4.6.0-2) 4.6.1 20110329 (prerelease)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
...
src/proto_http.c:3029:14: warning: variable ‘del_cl’ set but not used [-Wunused-but-set-variable]
In file included from ebtree/eb64tree.c:23:0:
ebtree/eb64tree.h: In function ‘__eb64_lookup’:
ebtree/eb64tree.h:128:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function ‘__eb64i_lookup’:
ebtree/eb64tree.h:180:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
from ebtree/ebimtree.c:23:
ebtree/eb64tree.h: In function ‘__eb64_lookup’:
ebtree/eb64tree.h:128:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function ‘__eb64i_lookup’:
ebtree/eb64tree.h:180:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
In file included from ebtree/ebpttree.h:26:0,
from ebtree/ebistree.h:25,
from ebtree/ebistree.c:23:
ebtree/eb64tree.h: In function ‘__eb64_lookup’:
ebtree/eb64tree.h:128:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
ebtree/eb64tree.h: In function ‘__eb64i_lookup’:
ebtree/eb64tree.h:180:6: warning: variable ‘node_bit’ set but not used [-Wunused-but-set-variable]
Bashkim Kasa reported that the stats admin page did not work when colons
were used in server or backend names. This was caused by url-encoding
resulting in ':' being sent as '%3A'. Now we systematically decode the
field names and values to fix this issue.
Now that we support the http-no-delay mode, we can optimize HTTP
chunking again by always waiting for more data to come until the
last chunk is met.
This patch may or may not be backported to 1.4, it's not a big deal,
it will mainly help for chunks which are aligned with the buffer size.
There are some very rare server-to-server applications that abuse the HTTP
protocol and expect the payload phase to be highly interactive, with many
interleaved data chunks in both directions within a single request. This is
absolutely not supported by the HTTP specification and will not work across
most proxies or servers. When such applications attempt to do this through
haproxy, it works but they will experience high delays due to the network
optimizations which favor performance by instructing the system to wait for
enough data to be available in order to only send full packets. Typical
delays are around 200 ms per round trip. Note that this only happens with
abnormal uses. Normal uses such as CONNECT requests nor WebSockets are not
affected.
When "option http-no-delay" is present in either the frontend or the backend
used by a connection, all such optimizations will be disabled in order to
make the exchanges as fast as possible. Of course this offers no guarantee on
the functionality, as it may break at any other place. But if it works via
HAProxy, it will work as fast as possible. This option should never be used
by default, and should never be used at all unless such a buggy application
is discovered. The impact of using this option is an increase of bandwidth
usage and CPU usage, which may significantly lower performance in high
latency environments.
This change should be backported to 1.4 since the first report of such a
misuse was in 1.4. Next patch will also be needed.
Commit 57f5c1 used to provide a nice improvement on chunked encoding since
it ensured that we did not set a PUSH flag for every chunk or buffer data
part of a chunked transfer.
Some applications appear to erroneously abuse HTTP chunking in order to
get interactive exchanges between a user agent and an origin server with
very small chunks. While it happens to work through haproxy, it's terribly
slow due to the latency added after passing each chunk to the system, who
could wait up to 200ms before pushing them onto the wire.
So we need an interactive mode for such usages. In the mean time, step back
on the optim, but not completely, so that we still keep the flag as long as
we know we're not finished with the current chunk.
This change should be backported to 1.4 too as the issue was discovered
with it.
This status code is used in response to requests matching "monitor-uri".
Some users need to adjust it to fit their needs (eg: make some strings
appear there). As it's already defined as a chunked string and used
exactly like other status codes, it makes sense to make it configurable
with the usual "errorfile", "errorloc", ...
Some people like to make the monitoring URL testable from unsafe locations.
Reporting haproxy's existence there can sometimes be problematic. This patch
should not be backported to 1.4 because it is possible, eventhough unlikely,
that some scripts rely on this word to appear there.
When doing fix 24581bae022bcf97ea7818e49ef27d21c92d6aa3 to correctly handle
response cookies, an unfortunate typo was inserted in the less likely code
path, resulting in a risk of crash when cookie-based persistence is enabled
and the server emits a cookie with several spaces around the equal sign.
This bug was noticed during a code backport. Its effects were never reported
because this situation is very unlikely to appear, but it can be provoked on
purpose by the server.
This patch must be backported to 1.4 versions which contain the fix above
(anything > 1.4.8), and to similar 1.3 versions > 1.3.25. 1.5-dev versions
after 1.5-dev2 are affected too.
Despite much care around handling the content-length as a 64-bit integer,
forwarding was broken on 32-bit platforms due to the 32-bit nature of
the ->to_forward member of the "buffer" struct. The issue is that this
member is declared as a long, so while it works OK on 64-bit platforms,
32-bit truncate the content-length to the lower 32-bits.
One solution could consist in turning to_forward to a long long, but it
is used a lot in the critical path, so it's not acceptable to perform
all buffer size computations on 64-bit there.
The fix consists in changing the to_forward member to a strict 32-bit
integer and ensure in buffer_forward() that only the amount of bytes
that can fit into it is considered. Callers of buffer_forward() are
responsible for checking that their data were taken into account. We
arbitrarily ensure we never consider more than 2G at once.
That's the way it was intended to work on 32-bit platforms except that
it did not.
This issue was tracked down hard at Exosec with Bertrand Jacquin,
Thierry Fournier and Julien Thomas. It remained undetected for a long
time because files larger than 4G are almost always transferred in
chunked-encoded format, and most platforms dealing with huge contents
these days run on 64-bit.
The bug affects all 1.5 and 1.4 versions, and must be backported.
Since we now have the copy of the target in the session, use it instead
of relying on the SI for it. The SI drops the target upon unregister()
so applets such as stats were logged as "NOSRV".
Johannes Smith reported some wrong retries count in logs associated with bad
requests. The cause was that the conn_retries field in the stream interface
was only initialized when attempting to connect, but is used when logging,
possibly with an uninitialized value holding last connection's conn_retries.
This could have been avoided by making use of a stream interface initializer.
This bug is 1.5-specific.
And also rename "req_acl_rule" "http_req_rule". At the beginning that
was a bit confusing to me, especially the "req_acl" list which in fact
holds what we call rules. After some digging, it appeared that some
part of the code is 100% HTTP and not just related to authentication
anymore, so let's move that part to HTTP and keep the auth-only code
in auth.c.
Right now, http-request rules are not evaluated if the URL matches the
stats request. This is quite unexpected. For instance, in the config
below, an abuser present in the abusers list will not be prevented access
to the stats.
listen pub
bind :8181
acl abuser src -f abusers.lst
http-request deny if abuser
stats uri /stats
It is not a big deal but it's not documented as such either. For 1.5, let's
have both lists be evaluated in turn, until one blocks. For 1.4 we'll simply
update the doc to indicate that.
Also instead of duplicating the code, the patch factors out the list walking
code. The HTTP auth has been moved slightly earlier, because it was set after
the header addition code, but we don't need to add headers to a request we're
dropping.
It's very annoying that frontend and backend stats are merged because we
don't know what we're observing. For instance, if a "listen" instance
makes use of a distinct backend, it's impossible to know what the bytes_out
means.
Some points take care of not updating counters twice if the backend points
to the frontend, indicating a "listen" instance. The thing becomes more
complex when we try to add support for server side keep-alive, because we
have to maintain a pointer to the backend used for last request, and to
update its stats. But we can't perform such comparisons anymore because
the counters will not match anymore.
So in order to get rid of this situation, let's have both frontend AND
backend stats in the "struct proxy". We simply update the relevant ones
during activity. Some of them are only accounted for in the backend,
while others are just for frontend. Maybe we can improve a bit on that
later, but the essential part is that those counters now reflect what
they really mean.
This patch turns internal server addresses to sockaddr_storage to
store IPv6 addresses, and makes the connect() function use it. This
code already works but some caveats with getaddrinfo/gethostbyname
still need to be sorted out while the changes had to be merged at
this stage of internal architecture changes. So for now the config
parser will not emit an IPv6 address yet so that user experience
remains unchanged.
This change should have absolutely zero user-visible effect, otherwise
it's a bug introduced during the merge, that should be reported ASAP.
This one has been removed and is now totally superseded by ->target.
To get the server, one must use target_srv(&s->target) instead of
s->srv now.
The function ensures that non-server targets still return NULL.
s->prev_srv is used by assign_server() only, but all code paths leading
to it now take s->prev_srv from the existing s->srv. So assign_server()
can do that copy into its own stack.
If at one point a different srv is needed, we still have a copy of the
last server on which we failed a connection attempt in s->target.
When dealing with HTTP keep-alive, we'll have to know if we can reuse
an existing connection. For that, we'll have to check if the current
connection was made on the exact same target (referenced in the stream
interface).
Thus, we need to first assign the next target to the session, then
copy it to the stream interface upon connect(). Later we'll check for
equivalence between those two operations.