This patch removes the generic opaque type for storing the configuration of the
acion "set-src" (HTTP_REQ_ACT_SET_SRC), and use the dedicated type "struct expr"
This patch is the first of a serie which merge all the action structs. The
function "tcp-request content", "tcp-response-content", "http-request" and
"http-response" have the same values and the same process for some defined
actions, but the struct and the prototype of the declared function are
different.
This patch try to unify all of these entries.
The union name "data" is a little bit heavy while we read the source
code because we can read "data.data.sint". The rename from "data" to "u"
makes the read easiest like "data.u.sint".
This patch remove the struct information stored both in the struct
sample_data and in the striuct sample. Now, only thestruct sample_data
contains data, and the struct sample use the struct sample_data for storing
his own data.
appsessions started to be deprecated with the introduction of stick
tables, and the latter are much more powerful and flexible, and in
addition they are replicated between nodes and maintained across
reloads. Let's now remove appsession completely.
This strategy is less extreme than "always", it only dispatches first
requests to validated reused connections, and moves a connection from
the idle list to the safe list once it has seen a second request, thus
proving that it could be reused.
In connect_server(), if we don't have a connection attached to the
stream-int, we first look into the server's idle_conns list and we
pick the first one there, we detach it from its owner if it had one.
If we used to have a connection, we close it.
This mechanism works well but doesn't scale : as servers increase,
the likeliness that the connection attached to the stream interface
doesn't match the server and gets closed increases.
This flag is set on an outgoing connection when this connection gets
some properties that must not be shared with other connections, such
as dynamic transparent source binding, SNI or a proxy protocol header,
or an authentication challenge from the server. This will be needed
later to implement connection reuse.
This function is now dedicated to idle connections only, which means
that it must not be used without any endpoint nor anything not a
connection. The connection remains attached to the stream interface.
Since we now always call this function with the reuse parameter cleared,
let's simplify the function's logic as it cannot return the existing
connection anymore. The savings on this inline function are appreciable
(240 bytes) :
$ size haproxy.old haproxy.new
text data bss dec hex filename
1020383 40816 36928 1098127 10c18f haproxy.old
1020143 40816 36928 1097887 10c09f haproxy.new
This patch removes the 32 bits unsigned integer and the 32 bit signed
integer. It replaces these types by a unique type 64 bit signed.
This makes easy the usage of integer and clarify signed and unsigned use.
With the previous version, signed and unsigned are used ones in place of
others, and sometimes the converter loose the sign. For example, divisions
are processed with "unsigned", if one entry is negative, the result is
wrong.
Note that the integer pattern matching and dotted version pattern matching
are already working with signed 64 bits integer values.
There is one user-visible change : the "uint()" and "sint()" sample fetch
functions which used to return a constant integer have been replaced with
a new more natural, unified "int()" function. These functions were only
introduced in the latest 1.6-dev2 so there's no impact on regular
deployments.
The man said that gmtime() and localtime() can return a NULL value.
This is not tested. It appears that all the values of a 32 bit integer
are valid, but it is better to check the return of these functions.
However, if the integer move from 32 bits to 64 bits, some 64 values
can be unsupported.
This option enables overriding source IP address in a HTTP request. It is
useful when we want to set custom source IP (e.g. front proxy rewrites address,
but provides the correct one in headers) or we wan't to mask source IP address
for privacy or compliance.
It acts on any expression which produces correct IP address.
This modification makes possible to use sample_fetch_string() in more places,
where we might need to fetch sample values which are not plain strings. This
way we don't need to fetch string, and convert it into another type afterwards.
When using aliased types, the caller should explicitly check which exact type
was returned (e.g. SMP_T_IPV4 or SMP_T_IPV6 for SMP_T_ADDR).
All usages of sample_fetch_string() are converted to use new function.
This patch adds support of variables during the processing of each stream. The
variables scope can be set as 'session', 'transaction', 'request' or 'response'.
The variable type is the type returned by the assignment expression. The type
can change while the processing.
The allocated memory can be controlled for each scope and each request, and for
the global process.
This patch permits to register a new keyword with the keyword "tcp-request content"
'tcp-request connection", tcp-response content", http-request" and "http-response"
which is identified only by matching the start of the keyword.
for example, we register the keyword "set-var" with the option "match_pfx"
and the configuration keyword "set-var(var_name)" matchs this entry.
Commit 9fbe18e ("MEDIUM: http: add a new option http-buffer-request")
introduced a regression due to a misplaced check causing the admin
mode of the HTTP stats not to work anymore.
This patch tried to ensure that when we need a request body for the
stats applet, and we have already waited for this body, we don't wait
for it again, but the condition was applied too early causing a
disabling of the entire processing the body, and based on the wrong
HTTP state (MSG_BODY) resulting in the test never matching.
Thanks to Chad Lavoie for reporting the problem.
This bug is 1.6-only, no backport is needed.
There are two reasons for not keeping the client connection alive upon a
redirect :
- save the client from uploading all data
- avoid keeping a connection alive if the redirect goes to another domain
The first case should consider an exception when all the data from the
client have been read already. This specifically happens on response
redirects after a POST to a server. This is an easy situation to detect.
It could later be improved to cover the cases where option
http-buffer-request is used.
Sometimes it's problematic not to have "http-response redirect" rules,
for example to perform a browser-based redirect based on certain server
conditions (eg: match of a header).
This patch adds "http-response redirect location <fmt>" which gives
enough flexibility for most imaginable operations. The connection to
the server is closed when this is performed so that we don't risk to
forward any pending data from the server.
Any pending response data are trimmed so that we don't risk to
forward anything pending to the client. It's harmless to also do that
for requests so we don't need to consider the direction.
In order to support http-response redirect, the parsing needs to be
adapted a little bit to only support the "location" type, and to
adjust the log-format parser so that it knows the direction of the
sample fetch calls.
This function was made to perform a redirect on requests only, it was
using a message or txn->req in an inconsistent way and did not consider
the possibility that it could be used for the other direction. Let's
clean it up to have both a request and a response messages.
This patch adds a http response capture keyword with the same behavior
as the previous patch called "MEDIUM: capture: Allow capture with slot
identifier".
This patch modifies the current http-request capture function
and adds a new keyword "id" that permits to identify a capture slot.
If the identified doesn't exists, the action fails silently.
Note that this patch removs an unused list initilisation, which seems
to be inherited from a copy/paste. It's harmless and does not need to
be backported.
LIST_INIT((struct list *)&rule->arg.act.p[0]);
This patch adds "capture-req" and "capture-res". These two converters
capture their entry in the allocated slot given in argument and pass
the input on the output.
These ones were already obsoleted in 1.4, marked for removal in 1.5,
and not documented anymore. They used to emit warnings, and do still
require quite some code to stay in place. Let's remove them now.
The "name" and "name_len" arguments in function "smp_fetch_url_param"
could be left uninitialized for subsequent calls.
[wt: no backport needed, this is an 1.6 regression introduced by
commit 4fdc74c ("MINOR: http: split the url_param in two parts") ]
This patch is the part of the body_param fetch. The goal is to have
generic url-encoded parser which can used for parsing the query string
and the body.
There are some situations hwere it's desirable to scan multiple occurrences
of a same parameter name in the query string. This change ensures this can
work, even with an empty name which will then iterate over all parameters.
This patch removes the structs "session", "stream" and "proxy" from
the sample-fetches and converters function prototypes.
This permits to remove some weight in the prototype call.
There's an issue related with shutting down POST transfers or closing the
connection after the end of the upload : the shutdown is forwarded to the
server regardless of the abortonclose option. The problem it causes is that
during a scan, brute force or whatever, it becomes possible that all source
ports are exhausted with all sockets in TIME_WAIT state.
There are multiple issues at once in fact :
- no action is done for the close, it automatically happens at the lower
layers thanks for channel_auto_close(), so we cannot act on NOLINGER ;
- we *do* want to continue to send a clean shutdown in tunnel mode because
some protocols transported over HTTP may need this, regardless of option
abortonclose, thus we can't set the option inconditionally
- for all other modes, we do want to close the dirty way because we're
certain whether we've sent everything or not, and we don't want to eat
all source ports.
The solution is a bit complex and applies to DONE/TUNNEL states :
1) disable automatic close for everything not a tunnel and not just
keep-alive / server-close. Force-close is now covered, as is HTTP/1.0
which implicitly works in force-close mode ;
2) when processing option abortonclose, we know we can disable lingering
if the client has closed and the connection is not in tunnel mode.
Since the last case above leads to a situation where the client side reports
an error, we know the connection will not be reused, so leaving the flag on
the stream-interface is safe. A client closing in the middle of the data
transmission already aborts the transaction so this case is not a problem.
This fix must be backported to 1.5 where the problem was detected.
Due to the code being mostly inspired from the tcp-request parser, it
does some crap because both don't work the same way. The "len" argument
could be mismatched and then the length could be used uninitialized.
This is only possible in frontends of course, but it will finally
make it possible to capture arbitrary http parts, including URL
parameters or parts of the message body.
It's worth noting that an ugly (char **) cast had to be done to
call sample_fetch_string() which is caused by a 5- or 6- levels
of inheritance of this type in the API. Here it's harmless since
the function uses it as a const, but this API madness must be
fixed, starting with the one or two rare functions that modify
the args and inflict this on each and every keyword parser.
(cherry picked from commit 484a4f38460593919a1c1d9a047a043198d69f45)
Body processing is still fairly limited, but this is a start. It becomes
possible to apply regex to find contents in order to decide where to route
a request for example. Only the first chunk is parsed for now, and the
response is not yet available (the parsing function must be duplicated for
this).
req.body : binary
This returns the HTTP request's available body as a block of data. It
requires that the request body has been buffered made available using
"option http-buffer-request". In case of chunked-encoded body, currently only
the first chunk is analyzed.
req.body_len : integer
This returns the length of the HTTP request's available body in bytes. It may
be lower than the advertised length if the body is larger than the buffer. It
requires that the request body has been buffered made available using
"option http-buffer-request".
req.body_size : integer
This returns the advertised length of the HTTP request's body in bytes. It
will represent the advertised Content-Length header, or the size of the first
chunk in case of chunked encoding. In order to parse the chunks, it requires
that the request body has been buffered made available using
"option http-buffer-request".
It is sometimes desirable to wait for the body of an HTTP request before
taking a decision. This is what is being done by "balance url_param" for
example. The first use case is to buffer requests from slow clients before
connecting to the server. Another use case consists in taking the routing
decision based on the request body's contents. This option placed in a
frontend or backend forces the HTTP processing to wait until either the whole
body is received, or the request buffer is full, or the first chunk is
complete in case of chunked encoding. It can have undesired side effects with
some applications abusing HTTP by expecting unbufferred transmissions between
the frontend and the backend, so this should definitely not be used by
default.
Note that it would not work for the response because we don't reset the
message state before starting to forward. For the response we need to
1) reset the message state to MSG_100_SENT or BODY , and 2) to reset
body_len in case of chunked encoding to avoid counting it twice.
Due to the fact that we were still considering only msg->sov for the
first byte of data after calling http_parse_chunk_size(), we used to
miscompute the input data size and to count the CRLF and the chunk size
as part of the input data. The effect is that it was possible to release
the processing with 3 or 4 missing bytes, especially if they're typed by
hand during debugging sessions. This can cause the stats page to return
some errors in admin mode, and the url_param balance algorithm to fail
to properly hash a body input.
This fix must be backported to 1.5.
Recently some browsers started to implement a "pre-connect" feature
consisting in speculatively connecting to some recently visited web sites
just in case the user would like to visit them. This results in many
connections being established to web sites, which end up in 408 Request
Timeout if the timeout strikes first, or 400 Bad Request when the browser
decides to close them first. These ones pollute the log and feed the error
counters. There was already "option dontlognull" but it's insufficient in
this case. Instead, this option does the following things :
- prevent any 400/408 message from being sent to the client if nothing
was received over a connection before it was closed ;
- prevent any log from being emitted in this situation ;
- prevent any error counter from being incremented
That way the empty connection is silently ignored. Note that it is better
not to use this unless it is clear that it is needed, because it will hide
real problems. The most common reason for not receiving a request and seeing
a 408 is due to an MTU inconsistency between the client and an intermediary
element such as a VPN, which blocks too large packets. These issues are
generally seen with POST requests as well as GET with large cookies. The logs
are often the only way to detect them.
This patch should be backported to 1.5 since it avoids false alerts and
makes it easier to monitor haproxy's status.
There's not much reason for continuing to accept HTTP/0.9 requests
nowadays except for manual testing. Now we disable support for these
by default, unless option accept-invalid-http-request is specified,
in which case they continue to be upgraded to 1.0.
While RFC2616 used to allow an undeterminate amount of digits for the
major and minor components of the HTTP version, RFC7230 has reduced
that to a single digit for each.
If a server can't properly parse the version string and falls back to 0.9,
it could then send a head-less response whose payload would be taken for
headers, which could confuse downstream agents.
Since there's no more reason for supporting a version scheme that was
never used, let's upgrade to the updated version of the standard. It is
still possible to enforce support for the old behaviour using options
accept-invalid-http-request and accept-invalid-http-response.
It would be wise to backport this to 1.5 as well just in case.
The spec mandates that content-length must be removed from messages if
Transfer-Encoding is present, not just for valid ones.
This must be backported to 1.5 and 1.4.
The rules related to how to handle a bad transfer-encoding header (one
where "chunked" is not at the final place) have evolved to mandate an
abort when this happens in the request. Previously it was only a close
(which is still valid for the server side).
This must be backported to 1.5 and 1.4.
While Transfer-Encoding is HTTP/1.1, we must still parse it in HTTP/1.0
in case an agent sends it, because it's likely that the other side might
use it as well, causing confusion. This will also result in getting rid
of the Content-Length header in such abnormal situations and in having
a clean connection.
This must be backported to 1.5 and 1.4.