The HTTP version parser used in ACLs has long been a string and
still had its own parser. This makes no sense, switch it to use
the standard string parser.
Since "hdr" and "cookie" were ambiguously referring to the request or response
depending on the context, we need a way to explicitly specify the direction.
By prefixing the fetches names with "req." and "res.", we can now restrict such
fetches to the appropriate direction. At the moment the fetches are explicitly
declared by later we might think about having an automatic match when "req." or
"res." appears. These explicit fetches are now used by the relevant ACLs.
ACL fetch being inherited from the sample fetch keyword, we don't need
anymore to specify what function to use to validate the fetch arguments.
Note that the job is still done in the ACL parsing code based on elements
from the sample fetch structs.
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.
The parsing is still performed by the ACL code though.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.
In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.
A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
The following sample fetch functions were only usable by ACLs but are now
usable by sample fetches too :
cook, cook_cnt, cook_val, hdr_cnt, hdr_ip, hdr_val, http_auth,
http_auth_group, http_first_req, method, req_proto_http, req_ver,
resp_ver, scook, scook_cnt, scook_val, shdr, shdr_cnt, shdr_ip,
shdr_val, status, urlp, urlp_val,
Most of them won't bring much benefit at the moment, or are even aliases of
existing ones, however they'll be needed for ACL->SMP convergence.
A new val_usr() function was added to resolve userlist names into pointers.
The http_auth_group ACL forgot to make its first argument mandatory, so
there was a check in cfgparse to report a vague error. Now that args are
correctly parsed, let's report something more precise.
All urlp* ACLs now support an optional 3rd argument like their sample
counter-part which is the optional delimiter.
The fetch functions have been renamed "smp_fetch_*".
Some args controls on the sample keywords have been relaxed so that we
can soon use them for ACLs :
- cookie now accepts to have an optional name ; it will return the
first matching cookie if the name is not set ;
- same for set-cookie and hdr
If a log-format involves some sample fetches that may not be present at
the logging instant, we can now report a warning.
Note that this is done both for log-format and for add-header and carefully
respects the original fetch keyword's capabilities.
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
This flag is used on ACL matches that support being looking up patterns
in trees. At the moment, only strings and IPs support tree-based lookups,
but the flag is randomly set also on integers and binary data, and is not
even always set on strings nor IPs.
Better get rid of this mess by only relying on the matching function to
decide whether or not it supports tree-based lookups, this is safer and
easier to maintain.
During normal HTTP request processing, request buffers are realigned if
there are less than global.maxrewrite bytes available after them, in
order to leave enough room for rewriting headers after the request. This
is done in http_wait_for_request().
However, if some HTTP inspection happens during a "tcp-request content"
rule, this realignment is not performed. In theory this is not a problem
because empty buffers are always aligned and TCP inspection happens at
the beginning of a connection. But with HTTP keep-alive, it also happens
at the beginning of each subsequent request. So if a second request was
pipelined by the client before the first one had a chance to be forwarded,
the second request will not be realigned. Then, http_wait_for_request()
will not perform such a realignment either because the request was
already parsed and marked as such. The consequence of this, is that the
rewrite of a sufficient number of such pipelined, unaligned requests may
leave less room past the request been processed than the configured
reserve, which can lead to a buffer overflow if request processing appends
some data past the end of the buffer.
A number of conditions are required for the bug to be triggered :
- HTTP keep-alive must be enabled ;
- HTTP inspection in TCP rules must be used ;
- some request appending rules are needed (reqadd, x-forwarded-for)
- since empty buffers are always realigned, the client must pipeline
enough requests so that the buffer always contains something till
the point where there is no more room for rewriting.
While such a configuration is quite unlikely to be met (which is
confirmed by the bug's lifetime), a few people do use these features
together for very specific usages. And more importantly, writing such
a configuration and the request to attack it is trivial.
A quick workaround consists in forcing keep-alive off by adding
"option httpclose" or "option forceclose" in the frontend. Alternatively,
disabling HTTP-based TCP inspection rules enough if the application
supports it.
At first glance, this bug does not look like it could lead to remote code
execution, as the overflowing part is controlled by the configuration and
not by the user. But some deeper analysis should be performed to confirm
this. And anyway, corrupting the process' memory and crashing it is quite
trivial.
Special thanks go to Yves Lafon from the W3C who reported this bug and
deployed significant efforts to collect the relevant data needed to
understand it in less than one week.
CVE-2013-1912 was assigned to this issue.
Note that 1.4 is also affected so the fix must be backported.
Sander Klein reported that since last snapshot, some downloads would
hang from nginx but succeed from apache. The culprit was not too hard
to find given the low number of recent changes affecting the data path.
Commit d655ffe slightly reorganized the HTTP state machine and
introduced this regression. The reason is that we must never jump
into the MSG_DONE case without first flushing remaining data because
this is not done anymore afterwards. This part is scheduled for
being reorganized since it's totally ugly especially since we added
compression, and this regression is an illustration of its readability.
The issue is entirely dependant on the server close sequence, which
explains why it was reproducible only with nginx here.
This commit fixed a bug and introduced a new one at the same time.
It's a stupid typo, the index to store the context is [0], not [2].
The effect is that parsing the header can loop forever if multiple
headers are found. This issue was reported by Lukas Tribus.
Baptiste Assmann reported that the cook*() ACLs do not work anymore.
The reason is the way we store the hdr_ctx between subsequent calls
to smp_fetch_cookie() since commit 3740635b (1.5-dev10).
The smp->ctx.a[] storage holds up to 8 pointers. It is not meant for
generic storage. We used to store hdr_ctx in the ctx, but while it used
to just fit for smp_fetch_hdr(), it does not for smp_fetch_cookie()
since we stored it at offset 2.
The correct solution is to use this storage to store a pointer to the
current hdr_ctx struct which is statically allocated.
An issue reported by David Coulson is that when using http-send-name-header,
the response processing would randomly be performed. The issue was first
diagnosed by Cyril Bont as being related to a time race when processing
the closing of the response.
In practice, the issue is a bit trickier. It happens that
http_send_name_header() did not update msg->sol after a rewrite. This
counter is supposed to point to the beginning of the message's body
once headers are scheduled for being forwarded. And not updating it
means that the first forwarding of the request headers in
http_request_forward_body() does not send the correct count, leaving
some bytes in chn->to_forward.
Then if the server sends its response in a single packet with the
close, the stream interface switches to state SI_ST_DIS which in
turn moves to SI_ST_CLO in process_session(), and to close the
outgoing connection. This is detected by http_request_forward_body(),
which then switches the request message to the error state, and syncs
all FSMs and removes any response analyser.
The response analyser being removed, no processing is performed on
the response buffer, which is tunnelled as-is to the client.
Of course, the correct fix consists in having http_send_name_header()
update msg->sol. Normally this ought not to have been needed, but it
is an abuse to modify data already scheduled for being forwarded, so
it is expected that such specific handling has to be done there. Better
not have generic functions deal with such cases, so that it does not
become the standard.
Note: 1.4 does not have this issue even if it does not update the
pointer either, because it forwards from msg->som which is not
updated at the moment the connect() succeeds. So no backport is
required.
Patch 6cbbdbf3 fixed the missing "-" delimitors in logs but it caused
them to be emitted with "http-request add-header", eventhough it was
correctly fixed for the unique-id format. Fix this by simply removing
LOG_OPT_MANDATORY in this case.
Commit 2b0108ad accidently got rid of the ability to emit a "-" for
empty log fields. This can happen for captured request and response
cookies, as well as for fetches. Since we don't want to have this done
for headers however, we set the default log method when parsing the
format. It is still possible to force the desired mode using +M/-M.
In select_compression_response_header(), some tests are rather confusing
as the "fail" label is used to deinitialize the compression context for
the session while it's branched only before initialization succeeds. The
test is always false here and the dereferencing of the comp_algo pointer
which might be null is also confusing. Remove that code which is not needed
anymore since commit ec3e3890 got rid of the latest issues.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
srv cannot be null in http_perform_server_redirect(), as it's taken
from the stream interface's target which is always valid for a
server-based redirect, and it was already dereferenced above, so in
practice, gcc already removes the test anyway.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
As stated in both RFC2616 and the http-bis drafts, Cache-Control:
no-transform must be looked up in the response since we're modifying
the response. However, its presence in the request is irrelevant to
any changes in the response :
7.2.1.6. no-transform
The "no-transform" request directive indicates that an intermediary
(whether or not it implements a cache) MUST NOT change the Content-
Encoding, Content-Range or Content-Type request header fields, nor
the request representation.
7.2.2.9. no-transform
The "no-transform" response directive indicates that an intermediary
(regardless of whether it implements a cache) MUST NOT change the
Content-Encoding, Content-Range or Content-Type response header
fields, nor the response representation.
Note: according to the specs, we're supposed to emit the following
response header :
Warning: 214 transformation applied
However no other product seems to do it, so the effect on user agents
is unclear.
The "reqtarpit" rule is not very handy to use. Now that we have more
flexibility with "http-request", let's finally make the tarpit rules
usable there.
There are still semantical differences between apply_filters_to_request()
and http_req_get_intercept_rule() because the former updates the counters
while the latter does not. So we currently have almost similar code leafs
for similar conditions, but this should be cleaned up later.
These are exactly the same as the classic redirect rules except
that they can be interleaved with other http-request rules for
more flexibility.
The redirect parser should probably be changed to stop at the condition
so that the caller puts its own condition pointer. At the moment, the
redirect rule and condition are parsed at once by build_redirect_rule()
and the condition is assigned to the http_req_rule.
We now have http_apply_redirect_rule() which does all the redirect-specific
job instead of having this inside http_process_req_common().
Also one of the benefit gained from uniformizing this code is that both
keep-alive and close response do emit the PR-- flags. The fix for the
flags could probably be backported to 1.4 though it's very minor.
The previous function http_perform_redirect() was becoming confusing
so it was renamed http_perform_server_redirect() since it only applies
to server-based redirection.
Several bugs were introduced recently due to a misunderstanding of how
this function works and what it was supposed to do. Since it's supposed
to only return the pointer to a rule which aborts further processing of
the request, let's rename it to avoid further issues.
The function was also slightly cleaned up without any functional change.
It happens that all of them call parse_logformat_line() which sets
proxy->to_log with a number of flags affecting the line format for
all three users. For example, having a unique-id specified disables
the default log-format since fe->to_log is tested when the session
is established.
Similarly, having "option logasap" will cause "+" to be inserted in
unique-id or headers referencing some of the fields depending on
LW_BYTES.
This patch first removes most of the dependency on fe->to_log whenever
possible. The first possible cleanup is to stop checking fe->to_log
for being null, considering that it always contains at least LW_INIT
when any such usage is made of the log-format!
Also, some checks are wrong. s->logs.logwait cannot be nulled by
"logwait &= ~LW_*" since LW_INIT is always there. This results in
getting the wrong log at the end of a request or session when a
unique-id or add-header is set, because logwait is still not null
but the log-format is not checked.
Further cleanups are required. Most LW_* flags should be removed or at
least replaced with what they really mean (eg: depend on client-side
connection, depend on server-side connection, etc...) and this should
only affect logging, not other mechanisms.
This patch fixes the default log-format and tries to limit interferences
between the log formats, but does not pretend to do more for the moment,
since it's the most visible breakage.
After the response headers are sent and the request processing is done,
the buffers are wiped out and the stream interface is closed. We must
then disable the request analysers, otherwise some processing will
happen on a closed stream interface and empty buffers which do not
match, causing all sort of crashes. This issue was introduced with
recent work on the stats, and was reported by Seri.
Previous commit was still wrong, it broke add-header and set-header
because we don't want to leave on these actions.
The http_check_access_rule() function should be redesigned, it was
initially thought for allow/deny rules but now it is executing other
non-final rules and at the same time returning a pointer to the last
final rule. That becomes a bit confusing and will need to be addressed
before we implement redirect and return.
This commit adding http-request add-header/set-header unfortunately introduced
a regression to the handling of the stats page which is not matched anymore.
Thanks to Dmitry Sivachenko for reporting this.
These two new statements allow to pass information extracted from the request
to the server. It's particularly useful for passing SSL information to the
server, but may be used for various other purposes such as combining headers
together to emulate internal variables.
At the moment, we need trash chunks almost everywhere and the only
correctly implemented one is in the sample code. Let's move this to
the chunks so that all other places can use this allocator.
Additionally, the get_trash_chunk() function now really returns two
different chunks. Previously it used to always overwrite the same
chunk and point it to a different buffer, which was a bit tricky
because it's not obvious that two consecutive results do alias each
other.
The HTTP header injection that are performed in dumpstats when responding
or when redirecting a POST request have nothing to do in dumpstats. They
do not use any state from the stats, and are 100% HTTP. Let's make the
headers there in the HTTP core, and have dumpstats only produce stats.
The dumpstats code looks like a spaghetti plate. Several functions are
supposed to be able to do several things but rely on complex states to
dispatch the work to independant functions. Most of the HTML output is
performed within the switch/case statements of the whole state machine.
Let's clean this up by adding new functions to emit the data and have
a few more iterators to avoid relying on so complex states.
The new stats dump sequence looks like this for CLI and for HTTP :
cli_io_handler()
-> stats_dump_sess_to_buffer() // "show sess"
-> stats_dump_errors_to_buffer() // "show errors"
-> stats_dump_raw_info_to_buffer() // "show info"
-> stats_dump_raw_info()
-> stats_dump_raw_stat_to_buffer() // "show stat"
-> stats_dump_csv_header()
-> stats_dump_proxy()
-> stats_dump_px_hdr()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
-> stats_dump_px_end()
http_stats_io_handler()
-> stats_http_redir()
-> stats_dump_http() // also emits the HTTP headers
-> stats_dump_html_head() // emits the HTML headers
-> stats_dump_csv_header() // emits the CSV headers (same as above)
-> stats_dump_http_info() // note: ignores non-HTML output
-> stats_dump_proxy() // same as above
-> stats_dump_http_end() // emits HTML trailer
When a server responds prematurely to a POST request, haproxy used to
cause the transfer to be aborted before the end. This is problematic
because this causes the client to receive a TCP reset when it tries to
push more data, generally preventing it from receiving the response
which contain the reason for the premature reponse (eg: "entity too
large" or an authentication request).
From now on we take care of allowing the upload traffic to flow to the
server even when the response has been received, since the server is
supposed to drain it. That way the client receives the server response.
This bug has been present since 1.4 and the fix should probably be
backported there.
The two ACL fetches "resp_ver" and "status", if used in a request despite
the warning, would return a match of zero length. This is inappropriate,
better return a non-match to be more consistent with other ACL processing.
This returns the concatenation of the base32 fetch and the src fetch.
The resulting type is of type binary, with a size of 8 or 20 bytes
depending on the source address family. This can be used to track
per-IP, per-URL counters.
This returns a 32-bit hash of the value returned by the "base"
fetch method above. This is useful to track per-URL activity on
high traffic sites without having to store all URLs. Instead a
shorter hash is stored, saving a lot of memory. The output type
is an unsigned integer.
Until now it was only possible to use track-sc1/sc2 with "src" which
is the IPv4 source address. Now we can use track-sc1/sc2 with any fetch
as well as any transformation type. It works just like the "stick"
directive.
Samples are automatically converted to the correct types for the table.
Only "tcp-request content" rules may use L7 information, and such information
must already be present when the tracking is set up. For example it becomes
possible to track the IP address passed in the X-Forwarded-For header.
HTTP request processing now also considers tracking from backend rules
because we want to be able to update the counters even when the request
was already parsed and tracked.
Some more controls need to be performed (eg: samples do not distinguish
between L4 and L6).
If a client aborts a request with an error (typically a TCP reset), we must
log a 400. Till now we did not set the status nor close the stream interface,
causing the request to attempt to be forwarded and logging a 503.
Should be backported to 1.4 which is affected as well.
To ensure that we only count when a response was compressed, we also
check for the SN_COMP_READY flag which indicates that the compression
was effectively initialized. Comp_algo alone is meaningless.