Recently some browsers started to implement a "pre-connect" feature
consisting in speculatively connecting to some recently visited web sites
just in case the user would like to visit them. This results in many
connections being established to web sites, which end up in 408 Request
Timeout if the timeout strikes first, or 400 Bad Request when the browser
decides to close them first. These ones pollute the log and feed the error
counters. There was already "option dontlognull" but it's insufficient in
this case. Instead, this option does the following things :
- prevent any 400/408 message from being sent to the client if nothing
was received over a connection before it was closed ;
- prevent any log from being emitted in this situation ;
- prevent any error counter from being incremented
That way the empty connection is silently ignored. Note that it is better
not to use this unless it is clear that it is needed, because it will hide
real problems. The most common reason for not receiving a request and seeing
a 408 is due to an MTU inconsistency between the client and an intermediary
element such as a VPN, which blocks too large packets. These issues are
generally seen with POST requests as well as GET with large cookies. The logs
are often the only way to detect them.
This patch should be backported to 1.5 since it avoids false alerts and
makes it easier to monitor haproxy's status.
There's not much reason for continuing to accept HTTP/0.9 requests
nowadays except for manual testing. Now we disable support for these
by default, unless option accept-invalid-http-request is specified,
in which case they continue to be upgraded to 1.0.
While RFC2616 used to allow an undeterminate amount of digits for the
major and minor components of the HTTP version, RFC7230 has reduced
that to a single digit for each.
If a server can't properly parse the version string and falls back to 0.9,
it could then send a head-less response whose payload would be taken for
headers, which could confuse downstream agents.
Since there's no more reason for supporting a version scheme that was
never used, let's upgrade to the updated version of the standard. It is
still possible to enforce support for the old behaviour using options
accept-invalid-http-request and accept-invalid-http-response.
It would be wise to backport this to 1.5 as well just in case.
The spec mandates that content-length must be removed from messages if
Transfer-Encoding is present, not just for valid ones.
This must be backported to 1.5 and 1.4.
The rules related to how to handle a bad transfer-encoding header (one
where "chunked" is not at the final place) have evolved to mandate an
abort when this happens in the request. Previously it was only a close
(which is still valid for the server side).
This must be backported to 1.5 and 1.4.
While Transfer-Encoding is HTTP/1.1, we must still parse it in HTTP/1.0
in case an agent sends it, because it's likely that the other side might
use it as well, causing confusion. This will also result in getting rid
of the Content-Length header in such abnormal situations and in having
a clean connection.
This must be backported to 1.5 and 1.4.
RFC7230 clarified the behaviour to adopt when facing both a
content-length and a transfer-encoding: chunked in a message. While
haproxy already complied with the method for getting the message
length right, and used to detect improper content-length duplicates,
it still did not remove the content-length header when facing a
transfer-encoding: chunked. Usually it is not a problem since other
agents (clients and servers) are required to parse the message
according to the rules that have been in place since RFC2616 in
1999.
However Régis Leroy reported the existence of at least one such
non-compliant agent so haproxy could be abused to get out of sync
with it on pipelined requests (HTTP request smuggling attack),
it consider part of a payload as a subsequent request.
The best thing to do is then to remove the content-length according
to RFC7230. It used to be in the todo list with a fixme in the code
while waiting for the standard to stabilize, let's apply it now that
it's published.
Thanks to Régis for bringing that subject to our attention.
This fix must be backported to 1.5 and 1.4.
When one of these functions replaces a part of the query string by
a shorter or longer new one, the header parsing is broken. This is
because the start of the first header is not updated.
In the same way, the total length of the request line is not updated.
I dont see any bug caused by this miss, but I guess than it is better
to store the good length.
This bug is only in the development version.
Commit 350f487 ("CLEANUP: session: simplify references to chn_{prod,cons}(&s->{req,res})")
introduced a regression causing the cli_conn to be picked from the server
side instead of the client side, so the XFF header is not appended anymore
since the connection is NULL.
Thanks to Reinis Rozitis for reporting this bug. No backport is needed
as it's 1.6-specific.
Commit bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp custom
actions") introduced the ability to interrupt and restart processing in
the middle of a TCP/HTTP ruleset. But it doesn't do it in a consistent
way : it checks current_rule_list, immediately dereferences current_rule,
which is only set in certain cases and never cleared. So that broke the
tcp-request content rules when the processing was interrupted due to
missing data, because current_rule was not yet set (segfault) or could
have been inherited from another ruleset if it was used in a backend
(random behaviour).
The proper way to do it is to always set current_rule before dereferencing
it. But we don't want to set it for all rules because we don't want any
action to provide a checkpointing mechanism. So current_rule is set to NULL
before entering the loop, and only used if not NULL and if current_rule_list
matches the current list. This way they both serve as a guard for the other
one. This fix also makes the current rule point to the rule instead of its
list element, as it's much easier to manipulate.
No backport is needed, this is 1.6-specific.
This patch adds support for error codes 429 and 405 to Haproxy and a
"deny_status XXX" option to "http-request deny" where you can specify which
code is returned with 403 being the default. We really want to do this the
"haproxy way" and hope to have this patch included in the mainline. We'll
be happy address any feedback on how this is implemented.
Many such function need a session, and till now they used to dereference
the stream. Once we remove the stream from the embryonic session, this
will not be possible anymore.
So as of now, sample fetch functions will be called with this :
- sess = NULL, strm = NULL : never
- sess = valid, strm = NULL : tcp-req connection
- sess = valid, strm = valid, strm->txn = NULL : tcp-req content
- sess = valid, strm = valid, strm->txn = valid : http-req / http-res
The registerable http_req_rules / http_res_rules used to require a
struct http_txn at the end. It's redundant with struct stream and
propagates very deep into some parts (ie: it was the reason for lua
requiring l7). Let's remove it now.
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.
The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
Now this one is dynamically allocated. It means that 280 bytes of memory
are saved per TCP stream, but more importantly that it will become
possible to remove the l7 pointer from fetches and converters since
it will be deduced from the stream and will support being null.
A lot of care was taken because it's easy to forget a test somewhere,
and the previous code used to always trust s->txn for being valid, but
all places seem to have been visited.
All HTTP fetch functions check the txn first so we shouldn't have any
issue there even when called from TCP. When branching from a TCP frontend
to an HTTP backend, the txn is properly allocated at the same time as the
hdr_idx.
This one will not necessarily be allocated for each stream, and we want
to use the fact that it equals null to know it's not present so that we
can always deduce its presence from the stream pointer.
This commit only creates the new pool.
The header captures are now general purpose captures since tcp rules
can use them to capture various contents. That removes a dependency
on http_txn that appeared in some sample fetch functions and in the
order by which captures and http_txn were allocated.
Interestingly the reset of the header captures were done at too many
places as http_init_txn() used to do it while it was done previously
in every call place.
The stream may never be null given that all these functions are called
from sample_process(). Let's remove this now confusing test which
sometimes happens after a dereference was already done.
When s->si[0].end was dereferenced as a connection or anything in
order to retrieve information about the originating session, we'll
now use sess->origin instead so that when we have to chain multiple
streams in HTTP/2, we'll keep accessing the same origin.
Just like for the listener, the frontend is session-wide so let's move
it to the session. There are a lot of places which were changed but the
changes are minimal in fact.
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.
In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.
The files stream.{c,h} were added and session.{c,h} removed.
The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.
Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.
Once all changes are completed, we should see approximately this :
L7 - http_txn
L6 - stream
L5 - session
L4 - connection | applet
There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.
Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
Commit a0dc23f ("MEDIUM: http: implement http-request set-{method,path,query,uri}")
forgot to null-terminate the list, resulting in crashes when these actions
are used if the platform doesn't pad the struct with nulls.
Thanks to Gunay Arslan for reporting a detailed trace showing the
origin of this bug.
No backport to 1.5 is needed.
It's documented that these sample fetch functions should count all headers
and/or all values when called with no name but in practice it's not what is
being done as a missing name causes an immediate return and an absence of
result.
This bug is present in 1.5 as well and must be backported.
Thanks to MSIE/IIS, the "deflate" name is ambigous. According to the RFC
it's a zlib-wrapped deflate stream, but IIS used to send only a raw deflate
stream, which is the only format MSIE understands for "deflate". The other
widely used browsers do support both formats. For this reason some people
prefer to emit a raw deflate stream on "deflate" to serve more users even
it that means violating the standards. Haproxy only follows the standard,
so they cannot do this.
This patch makes it possible to have one algorithm name in the configuration
and another one in the protocol. This will make it possible to have a new
configuration token to add a different algorithm so that users can decide if
they want a raw deflate or the standard one.
This function is a callback for HTTP actions. This function
creates the replacement string from a build_logline() format
and transform the header.
This patch split this function in two part. With this modification,
the header transformation and the replacement string are separed.
We can now transform the header with another replacement string
source than a build_logline() format.
The first part is the replacement engine. It take a replacement action
number and a replacement string and process the action.
The second part is the function which is called by the 'http-request
action' to replace a request line part. This function makes the
string used as replacement.
This split permits to use the replacement engine in other parts of the
code than the request action. The Lua use it for his own http action.
These function used an invalid header parser.
- The trailing white-spaces were embedded in the replacement regex,
- The double-quote (") containing comma (,) were not respected.
This patch replace this parser by the "official" parser http_find_header2().
The function http_replace_value use bad variable to detect the end
of the input string.
Regression introduced by the patch "MEDIUM: regex: Remove null
terminated strings." (c9c2daf2)
We need to backport this patch int the 1.5 stable branch.
WT: there is no possibility to overwrite existing data as we only read
past the end of the request buffer, to copy into the trash. The copy
is bounded by buffer_replace2(), just like the replacement performed
by exp_replace(). However if a buffer happens to contain non-zero data
up to the next unmapped page boundary, there's a theorical risk of
crashing the process despite this not being reproducible in tests.
The risk is low because "http-request replace-value" did not work due
to this bug so that probably means it's not used yet.
This bug is introduced by the commit "MEDIUM: http/tcp: permit to
resume http and tcp custom actions" ( bc4c1ac6ad3dfd5bb ).
Before this patch, the return code of the function was ignored.
After this path, if the function returns 0, it wats a YIELD.
The function http_action_set_req_line() retunrs 0, in succes case.
This patch changes the return code of this function.
It is common for rest applications to return status codes other than
200, so compress the other common 200 level responses which might
contain content.
These 4 combinations are needlessly complicated since the session already
has direct access to the associated stream interfaces without having to
check an indirect pointer.
The purpose of these two macros will be to pass via the session to
find the relevant stream interfaces so that we don't need to store
the ->cons nor ->prod pointers anymore. Currently they're only defined
so that all references could be removed.
Note that many places need a second pass of clean up so that we don't
have any chn_prod(&s->req) anymore and only &s->si[0] instead, and
conversely for the 3 other cases.
This new flag "SI_FL_ISBACK" is set only on the back SI and is cleared
on the front SI. That way it's possible only by looking at the SI to
know what side it is.
We'll soon remove direct references to the channels from the stream
interface since everything belongs to the same session, so let's
first not dereference si->ib / si->ob anymore and use macros instead.
The channels were pointers to outside structs and this is not needed
anymore since the buffers have moved, but this complicates operations.
Move them back into the session so that both channels and stream interfaces
are always allocated for a session. Some places (some early sample fetch
functions) used to validate that a channel was NULL prior to dereferencing
it. Now instead we check if chn->buf is NULL and we force it to remain NULL
until the channel is initialized.
Commit bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp custom actions")
unfortunately broke the stats applet by moving the clearing of the analyser bit
after processing the applet headers. It used to work only in HTTP/1.1 and not
in HTTP/1.0. This is 1.6-specific, no backport is needed.
Later, the processing of some actions needs to be interrupted and resumed
later. This patch permit to resume the actions. The actions that needs
to run with the resume mode are not yet avalaible. It will be soon with
Lua patches. So the code added by this patch is untestable for the moment.
The list of "tcp_exec_req_rules" cannot resme because is called by the
unresumable function "accept_session".
Actually, this function returns a pointer on the rule that stop
the evaluation of the rule list. Later we integrate the possibility
of interrupt and resue the processsing of some actions. The current
response mode is not sufficient to returns the "interrupt" information.
The pointer returned is never used, so I change the return type of
this function by an enum. With this enum, the function is ready to
return the "interupt" value.
The functions "val_payload_lv" and "val_hdr" are useful with
lua. The lua automatic binding for sample fetchs needs to
compare check functions.
The "arg_type_names" permit to display error messages.
Some usages of the converters need to know the attached session. The Lua
needs the session for retrieving his running context. This patch adds
the "session" as an argument of the converters prototype.