Commit f003d37 ("BUG/MINOR: http: don't report client aborts as server errors")
attempted to fix a longstanding issue by which some client aborts could be
logged as server errors. Unfortunately, one of the tests involved there also
catches truncated server responses, which are reported as client aborts.
Instead, only check that the client has really closed using the abortonclose
option, just as in done in the request path (which means that the close was
propagated to the server).
The faulty fix above was introduced in 1.5-dev15, and was backported into
1.4.23.
Thanks to Patrick Hemmer for reporting this issue with traces showing the
root cause of the problem.
Commit ad90351 ("MINOR: http: Add the "language" converter to for use with accept-language")
introduced a typo in parse_qvalue :
if (*end)
*end = qvalue;
while it should be :
if (end)
*end = qvalue;
Since end is tested for being NULL. This crashes when selecting the
compression algorithm since end is NULL here. No backport is needed,
this is just in latest 1.5-dev.
The forwarding code is never obvious to enter into for newcomers, so
better improve the documentation about how states are chained and what
happens for each of them.
Doing so avoids calling channel_forward() for each part of the chunk
parsing and lowers the number of calls to channel_forward() to only
one per buffer, resulting in about 11% performance increase on small
chunks forwarding rate.
The call to flush the compression buffers only needs to be done when
entering the final states or when leaving with missing data. After
that, if trailers are present, they have to be forwarded.
Now we have valid buffer offsets, we can use them to safely parse the
input and only forward when needed. Thus we can get rid of the
consumed_data accumulator, and the code now works both for chunked and
content-length, even with a server feeding one byte at a time (which
systematically broke the previous one).
It's worth noting that 0<CRLF> must always be sent after end of data
(ie: chunk_len==0), and that the trailing CRLF is sent only content
length mode, because in chunked we'll have to pass trailers.
This is basically a revert of commit 667c2a3 ("BUG/MAJOR: http: compression
still has defects on chunked responses").
The latest changes applied to message pointers should have got rid of all
the issues that were making the compression of partial chunks unreliable.
Currently, we forward headers only if the incoming message is still before
HTTP_MSG_CHUNK_SIZE, otherwise they'll be considered as data. In practice
this is always true for the response since there's no data inspection, and
for the request there is no compression so there's no problem with forwarding
them as data.
But the principle is incorrect and will make it difficult to later add data
processing features. So better fix it now.
The new principle is simple :
- if headers were not yet forwarded, forward them now.
- while doing so, check if we need to update the state
If for some reason, the compression returns an error, the compression
is not deinitialized which also means that any pending data are not
flushed and could be lost, especially in the chunked-encoded case.
No backport is needed.
When a parsing error was encountered in a chunked response, we failed
to properly deinitialize the compression context. There was no impact
till now since compression of chunked responses was disabled. No backport
is needed.
Thanks to the last updates on the message pointers, it is now safe again to
enable forwarding of the request headers while waiting for the connection to
complete because we know how to safely rewind this part.
So this patch slightly modifies what was done in commit 80a92c0 ("BUG/MEDIUM:
http: don't start to forward request data before the connect") to let up to
msg->sov bytes be forwarded when waiting for the connection. The resulting
effect is that a POST request may now be sent with the connect's ACK, which
still saves a packet and may even be useful later when TFO is supported.
In order to avoid abusively relying on buf->o to guess how many bytes to
rewind during a redispatch, we now clear msg->sov. Thus the meaning of this
field is exactly "how many bytes of headers are left to be forwarded". It
is still possible to rewind because msg->eoh + msg->eol equal that value
before scheduling the forwarding, so we can always subtract them.
http_hdr_rewind() returns the number of bytes to rewind before buf->p to
find the beginning of headers. At the moment it's not exact as it still
relies on buf->o, assuming that no other data from a past message were
pending there, but it's what was done till there.
The purpose is to centralize further ->sov changes aiming at avoiding
to rely on buf->o.
We used to have msg->sov updated for every chunk that was parsed. The issue
is that we want to be able to rewind after chunks were parsed in case we need
to redispatch a request and perform a new hash on the request or insert a
different server header name.
Currently, msg->sov and msg->next make parallel progress. We reached a point
where they're always equal because msg->next is initialized from msg->sov,
and is subtracted msg->sov's value each time msg->sov bytes are forwarded.
So we can now ensure that msg->sov can always be replaced by msg->next for
every state after HTTP_MSG_BODY where it is used as a position counter.
This allows us to keep msg->sov untouched whatever the number of chunks that
are parsed, as is needed to extract data from POST request (eg: url_param).
However, we still need to know the starting position of the data relative to
the body, which differs by the chunk size length. We use msg->sol for this
since it's now always zero and unused in the body.
So with this patch, we have the following situation :
- msg->sov = msg->eoh + msg->eol = size of the headers including last CRLF
- msg->sol = length of the chunk size if any. So msg->sov + msg->sol = DATA.
- msg->next corresponds to the byte being inspected based on the current
state and is always >= msg->sov before starting to forward anything.
Since sov and next are updated in case of header rewriting, a rewind will
fix them both when needed. Of course, ->sol has no reason for changing in
such conditions, so it's fine to keep it relative to msg->sov.
In theory, even if a redispatch has to be performed, a transformation
occurring on the request would still work because the data moved would
still appear at the same place relative to bug->p.
This function is only a parser, it must start to parse at the next character
and only update the outgoing relative pointers, but not expect the buffer to
be aligned with the next byte to be parsed.
It's important to fix this otherwise we cannot use this function to parse
chunks without starting to forward data.
There are still some pending issues in the gzip compressor, and fixing
them requires a better handling of intermediate parsing states.
Another issue to deal with is the rewinding of a buffer during a redispatch
when a load balancing algorithm involves L7 data because the exact amount of
data to rewind is not clear. At the moment, this is handled by unwinding all
pending data, which cannot work in responses due to pipelining.
Last, having a first analysis which parses the body and another one which
restarts from where the parsing was left is wrong. Right now it only works
because we never both parse and transform in the same direction. But that
is wrong anyway.
In order to address the first issue, we'll have to use msg->eoh + msg->eol
to find the end of headers, and we still need to store the information about
the forwarded header length somewhere (msg->sol might be reused for this).
msg->sov may only be used for the start of data and not for subsequent chunks
if possible. This first implies that we stop sharing it with header length,
and stop using msg->sol there. In fact we don't need it already as it is
always zero when reaching the HTTP_MSG_BODY state. It was only updated to
reflect a copy of msg->sov.
So now as a first step into that direction, this patch ensure that msg->sol
is never re-assigned after being set to zero and is not used anymore when
we're dealing with HTTP processing and forwarding. We'll later reuse it
differently but for now it's secured.
The patch does nothing magic, it only removes msg->sol everywhere it was
already zero and avoids setting it. In order to keep the sov-sol difference,
it now resets sov after forwarding data. In theory there's no problem here,
but the patch is still tagged major because that code is complex.
One of the issues we face when we need to either forward headers only
before compressing, or rewind the stream during a redispatch is to know
the proper length of the request headers. msg->eoh always has the total
length up to the last CRLF, and we never know whether the request ended
with a single LF or a standard CRLF. This makes it hard to rewind the
headers without explicitly checking the bytes in the buffer.
Instead of doing so, we now use msg->eol to carry the length of the last
CRLF (either 1 or 2). Since it is not modified at all after HTTP_MSG_BODY,
and was only left in an undefined state, it is safe to use at any moment.
Thus, the complete header length to forward or to rewind now is always
msg->eoh + msg->eol.
Content-length encoded message bodies are trivial to deal with, but
chunked-encoded will require improvements, so let's separate the code
flows between the two to ease next steps. The behaviour is not changed
at all, the code is only rearranged.
This is the continuation of previous patch. Now that full buffers are
not rejected anymore, let's wait for at least the advertised chunk or
body length to be present or the buffer to be full. When either
condition is met, the message processing can go forward.
Thus we don't need to use url_param_post_limit anymore, which was passed
in the configuration as an optionnal <max_wait> parameter after the
"check_post" value. This setting was necessary when the feature was
implemented because there was no support for parsing message bodies.
The argument is now silently ignored if set in the configuration.
http_process_request_body() currently expects a request body containing
exactly an expected message body. This was done in order to support load
balancing on a unique POST parameter but the way it's done still suffers
from some limitations. One of them is that there is no guarantee that the
accepted message will contain the appropriate string if it starts with
another parameter. But at the same time it will reject a message when the
buffer is full.
So as a first step, we don't reject anymore message bodies that fill the
buffer.
language(<value[;value[;value[;...]]]>[,<default>])
Returns the value with the highest q-factor from a list as
extracted from the "accept-language" header using "req.fhdr".
Values with no q-factor have a q-factor of 1. Values with a
q-factor of 0 are dropped. Only values which belong to the
list of semi-colon delimited <values> will be considered. If
no value matches the given list and a default value is
provided, it is returned. Note that language names may have
a variant after a dash ('-'). If this variant is present in
the list, it will be matched, but if it is not, only the base
language is checked. The match is case-sensitive, and the
output string is always one of those provided in arguments.
The ordering of arguments is meaningless, only the ordering
of the values in the request counts, as the first value among
multiple sharing the same q-factor is used.
Example :
# this configuration switches to the backend matching a
# given language based on the request :
acl de req.fhdr(accept-language),language(de;es;fr;en) de
acl es req.fhdr(accept-language),language(de;es;fr;en) es
acl fr req.fhdr(accept-language),language(de;es;fr;en) fr
acl en req.fhdr(accept-language),language(de;es;fr;en) en
use_backend german if de
use_backend spanish if es
use_backend french if fr
use_backend english if en
default_backend choose_your_language
OpenBSD complains about this use of sprintf() :
src/proto_http.o(.text+0xb0e6): In function `http_process_request':
src/proto_http.c:4127: warning: sprintf() is often misused, please use snprintf()
Here there's no risk as the strings are way shorter than the buffer size
but let's fix it anyway.
RFC 1945 (4.1) defines an HTTP/0.9 request ("Simple-Request") as:
Simple-Request = "GET" SP Request-URI CRLF
HAProxy tries to automatically upgrade HTTP/0.9 requests to
to HTTP/1.0, by appending "HTTP/1.0" to the request and setting the
Request-URI to "/" if it was not present. The latter however is
RFC-incompatible, as HTTP/0.9 requests must already have a Request-URI
according to the definition above. Additionally,
http_upgrade_v09_to_v10() does not check whether the request method is
indeed GET (the mandatory method for HTTP/0.9).
As a result, any single- or double-word request line is regarded as a
valid HTTP request. We fix this by failing in http_upgrade_v09_to_v10()
if the request method is not GET or the request URI is not present.
The function url2sa() converts faster url like http://<ip>:<port> in a
struct sockaddr_storage. This patch add:
- the https support
- permit to return the length parsed
- support IPv6
- support DNS synchronous resolution only during start of haproxy.
The faster IPv4 convertion way is keeped. IPv6 is slower, because I use
the standard IPv6 parser function.
Till now we didn't consider "q=". It's problematic because the first
effect is that compression tokens were not even matched if it was
present.
It is important to parse it correctly because we still want to allow
a user-agent to send "q=0" to explicitly disable a compressor, or to
specify its preferences.
Now, q-values are respected in order of precedence, and when several
q-values are equal, the first occurrence is used.
This patch replace a lot of pointeur by pattern matching identifier. If
the declared ACL use all the predefined pattern matching functions, the
register function gets the functions provided by "pattern.c" and
identified by the PAT_LATCH_*.
In the case of the acl uses his own functions, they can be declared, and
the acl registration doesn't change it.
The find_smp search the smp using the value of the pat_ref_elt pointer.
The pat_find_smp_* are no longer used. The function pattern_find_smp()
known all pattern indexation, and can be found
All the pattern delete function can use her reference to the original
"struct pat_ref_elt" to find the element to be remove. The functions
pat_del_list_str() and pat_del_meth() were deleted because after
applying this modification, they have the same code than pat_del_list_ptr().
This patch extract the expect_type variable from the "struct pattern" to
"struct pattern_head". This variable is set during the declaration of
ACL and MAP. With this change, the function "pat_parse_len()" become
useless and can be replaced by "pat_parse_int()".
Implicit ACLs by default rely on the fetch's output type, so let's simply do
the same for all other ones. It has been verified that they all match.
Some functions needs to change the sample associated to pattern. This
new pointer permit to return the a pointer to the sample pointer. The
caller can use or change the value.
This commit adds a delete function for patterns. It looks up all
instances of the pattern to delete and deletes them all. The fetch
keyword declarations have been extended to point to the appropriate
delete function.
The match function known the format of the pattern. The pattern can be
stored in a list or in a tree. The pattern matching function use itself
the good entry point and indexation type.
Each pattern matching function return the struct pattern that match. If
the flag "fill" is set, the struct pattern is filled, otherwise the
content of this struct must not be used.
With this feature, the general pattern matching function cannot have
exceptions for building the "struct pattern".
The method are actuelly stored using two types. Integer if the method is
known and string if the method is not known. The fetch is declared as
UINT, but in some case it can provides STR.
This patch create new type called METH. This type contain interge for
known method and string for the other methods. It can be used with
automatic converters.
The pattern matching can expect method.
During the free or prune function, http_meth pettern is freed. This
patch initialise the freed pointer to NULL.
The operations applied on types SMP_T_CSTR and SMP_T_STR are the same,
but the check code and the declarations are double, because it must
declare action for SMP_T_C* and SMP_T_*. The declared actions and checks
are the same. this complexify the code. Only the "conv" functions can
change from "C*" to "*"
Now, if a function needs to modify input string, it can call the new
function smp_dup(). This one duplicate data in a trash buffer.
The pattern parse functions put the parsed result in a "struct pattern"
without memory allocation. If the pattern must reference the input data
without changes, the pattern point to the parsed string. If buffers are
needed to store translated data, it use th trash buffer. The indexation
function that allocate the memory later if it is needed.
Before this patch, the indexation function check the declared patttern
matching function and index the data according with this function. This
is not useful to add some indexation mode.
This commit adds dedicated indexation function. Each struct pattern is
associated with one indexation function. This function permit to index
data according with the type of pattern and with the type of match.
After the previous patches, the "pat_parse_strcat()" function disappear,
and the "pat_parse_int()" and "pat_parse_dotted_ver()" functions dont
use anymore the "opaque" argument, and take only one string on his
input.
So, after this patch, each pattern parser no longer use the opaque
variable and take only one string as input. This patch change the
prototype of the pattern parsing functions.
Now, the "char **args" is replaced by a "char *arg", the "int *opaque"
is removed and these functions return 1 in succes case, and 0 if fail.
This patch remove the limit of 32 groups. It also permit to use standard
"pat_parse_str()" function in place of "pat_parse_strcat()". The
"pat_parse_strcat()" is no longer used and its removed. Before this
patch, the groups are stored in a bitfield, now they are stored in a
list of strings. The matching is slower, but the number of groups is
low and generally the list of allowed groups is short.
The fetch function "smp_fetch_http_auth_grp()" used with the name
"http_auth_group" return valid username. It can be used as string for
displaying the username or with the acl "http_auth_group" for checking
the group of the user.
Maybe the names of the ACL and fetch methods are no longer suitable, but
I keep the current names for conserving the compatibility with existing
configurations.
The function "userlist_postinit()" is created from verification code
stored in the big function "check_config_validity()". The code is
adapted to the new authentication storage system and it is moved in the
"src/auth.c" file. This function is used to check the validity of the
users declared in groups and to check the validity of groups declared
on the "user" entries.
This resolve function is executed before the check of all proxy because
many acl needs solved users and groups.
The binary samples are sometimes copied as is into http headers.
A sample can contain bytes unallowed by the http rfc concerning
header content, for example if it was extracted from binary data.
The resulting http request can thus be invalid.
This issue does not yet happen because haproxy currently (mistakenly)
hex-encodes binary data, so it is not really possible to retrieve
invalid HTTP chars.
The solution consists in hex-encoding all non-printable chars prefixed
by a '%' sign.
No backport is needed since existing code is not affected yet.
Currently there are two places where the compression context is released,
one in session_free() and another one in http_end_txn_clean_session().
Both of them call http_end_txn(), either directly or via http_reset_txn(),
and this function is made for this exact purpose. So let's centralize the
call there instead.
Currently, "balance url_param check_post" randomly works. If the client
sends chunked data and there's another chunk after the one containing the
data, http_request_forward_body() will advance msg->sov and move the start
of data to the beginning of the last chunk, and get_server_ph_post() will
not find the data.
In order to avoid this, we add an HTTP_MSGF_WAIT_CONN flag whose goal is
to prevent the forwarding code from parsing until the connection is
confirmed, so that we're certain not to fail on a redispatch. Note that
we need to force channel_auto_connect() since the output buffer is empty
and a previous analyser might have stopped auto-connect.
The flag is currently set whenever some L7 POST analysis is needed for a
connect() so that it correctly addresses all corner cases involving a
possible rewind of the buffer, waiting for a better fix.
Note that this has been broken for a very long time. Even all 1.4 versions
seem broken but differently, with ->sov pointing to the end of the arguments.
So the fix should be considered for backporting to all stable releases,
possibly including 1.3 which works differently.
Finn Arne Gangstad reported that commit 6b726adb35 ("MEDIUM: http: do
not report connection errors for second and further requests") breaks
support for serving static files by abusing the errorfile 503 statement.
Indeed, a second request over a connection sent to any server or backend
returning 503 would silently be dropped.
The proper solution consists in adding a flag on the session indicating
that the server connection was reused, and to only avoid the error code
in this case.