As usual, when touching any is* function, Solaris complains about the
type of the element being checked. Better backport this to 1.5 since
nobody knows what the emitted code looks like since macros are used
instead of functions.
We used to only clear flags when reusing the static sample before calling
sample_process(), but that's not enough because there's a context in samples
that can be used by some fetch functions such as auth, headers and cookies,
and not reinitializing it risks that a pointer of a different type is used
in the wrong context.
An example configuration which triggers the case consists in mixing hdr()
and http_auth_group() which both make use of contexts :
http-request add-header foo2 %[hdr(host)],%[http_auth_group(foo)]
The solution is simple, initialize all the sample and not just the flags.
This fix must be backported into 1.5 since it was introduced in 1.5-dev19.
Baptiste Assmann reported a corner case in the releasing of stick-counters:
we release content-aware counters before logging. In the past it was not a
problem, but since now we can log them it, it prevents one from logging
their value. Simply switching the log production and the release of the
counter fixes the issue.
This should be backported into 1.5.
The sample fetch function "base" makes use of the trash which is also
used by set-header/add-header etc... everything which builds a formated
line. So we end up with some junk in the header if base is in use. Let's
fix this as all other fetches by using a trash chunk instead.
This bug was reported by Baptiste Assmann, and also affects 1.5.
This is the 3rd regression caused by the changes below. The latest to
date was reported by Finn Arne Gangstad. If a server responds with no
content-length and the client's FIN is never received, either we leak
the client-side FD or we spin at 100% CPU if timeout client-fin is set.
Enough is enough. The amount of tricks needed to cover these side-effects
starts to look like used toilet paper stacked over a chocolate cake. I
don't want to eat that cake anymore!
All this to avoid reporting a server-side timeout when a client stops
uploading data and haproxy expires faster than the server... A lot of
"ifs" resulting in a technically valid log that doesn't always please
users, and whose alternative causes that many issues for all others
users.
So let's revert this crap merged since 1.5-dev25 :
Revert "CLEANUP: http: don't clear CF_READ_NOEXP twice"
This reverts commit 1592d1e72a.
Revert "BUG/MEDIUM: http: clear CF_READ_NOEXP when preparing a new transaction"
This reverts commit 77d29029af.
Revert "BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are not called"
This reverts commit 0943757a21.
Revert "BUG/MEDIUM: http: disable server-side expiration until client has sent the body"
This reverts commit 3bed5e9337.
Revert "BUG/MEDIUM: http: correctly report request body timeouts"
This reverts commit b9edf8fbec.
Revert "BUG/MEDIUM: http/session: disable client-side expiration only after body"
This reverts commit b1982e27aa.
If a cleaner AND SAFER way to do something equivalent in 1.6-dev, we *might*
consider backporting it to 1.5, but given the vicious bugs that have surfaced
since, I doubt it will happen any time soon.
Fortunately, that crap never made it into 1.4 so no backport is needed.
The new regex function can use string and length. The HAproxy buffer are
not null-terminated, and the use of the regex_exec* functions implies
the add of this null character. This patch replace these function by the
functions which takes a string and length as input.
Just the file "proto_http.c" is change because this one is more executed
than other. The file "checks.c" have a very low usage, and it is not
interesting to change it. Furthermore, the buffer used by "checks.c" are
null-terminated.
This patch remove all references of standard regex in haproxy. The last
remaining references are only in the regex.[ch] files.
In the file src/checks.c, the original function uses a "pmatch" array.
In fact this array is unused. This patch remove it.
When I renamed the modify-header action to replace-value, one of them
was mistakenly set to "replace-val" instead. Additionally, differentiation
of the two actions must be done on args[0][8] and not *args[8]. Thanks
Thierry for spotting...
This patch adds two new actions to http-request and http-response rulesets :
- replace-header : replace a whole header line, suited for headers
which might contain commas
- replace-value : replace a single header value, suited for headers
defined as lists.
The match consists in a regex, and the replacement string takes a log-format
and supports back-references.
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
This patch adds support for captures with no header name. The purpose
is to allow extra captures to be defined and logged along with the
header captures.
Commit b1982e2 ("BUG/MEDIUM: http/session: disable client-side expiration
only after body") was tricky and caused an issue which was fixed by commit
0943757 ("BUG/MEDIUM: session: don't clear CF_READ_NOEXP if analysers are
not called"). But that's not enough, another issue was introduced and further
emphasized by last fix.
The issue is that the CF_READ_NOEXP flag needs to be cleared when waiting
for a new request over that connection, otherwise we cannot expire anymore
an idle connection waiting for a new request.
This explains the neverending keepalives reported by at least 3 different
persons since dev24. No backport is needed.
Currently exp_replace() (which is used in reqrep/reqirep) is
vulnerable to a buffer overrun. I have been able to reproduce it using
the attached configuration file and issuing the following command:
wget -O - -S -q http://localhost:8000/`perl -e 'print "a"x4000'`/cookie.php
Str was being checked only in in while (str) and it was possible to
read past that when more than one character was being accessed in the
loop.
WT:
Note that this bug is only marked MEDIUM because configurations
capable of triggering this bug are very unlikely to exist at all due
to the fact that most rewrites consist in static string additions
that largely fit into the reserved area (8kB by default).
This fix should also be backported to 1.4 and possibly even 1.3 since
it seems to have been present since 1.1 or so.
Config:
-------
global
maxconn 500
stats socket /tmp/haproxy.sock mode 600
defaults
timeout client 1000
timeout connect 5000
timeout server 5000
retries 1
option redispatch
listen stats
bind :8080
mode http
stats enable
stats uri /stats
stats show-legends
listen tcp_1
bind :8000
mode http
maxconn 400
balance roundrobin
reqrep ^([^\ :]*)\ /(.*)/(.*)\.php(.*) \1\ /\3.php?arg=\2\2\2\2\2\2\2\2\2\2\2\2\2\4
server srv1 127.0.0.1:9000 check port 9000 inter 1000 fall 1
server srv2 127.0.0.1:9001 check port 9001 inter 1000 fall 1
Servers used to have 3 flags to store a state, now they have 4 states
instead. This avoids lots of confusion for the 4 remaining undefined
states.
The encoding from the previous to the new states can be represented
this way :
SRV_STF_RUNNING
| SRV_STF_GOINGDOWN
| | SRV_STF_WARMINGUP
| | |
0 x x SRV_ST_STOPPED
1 0 0 SRV_ST_RUNNING
1 0 1 SRV_ST_STARTING
1 1 x SRV_ST_STOPPING
Note that the case where all bits were set used to exist and was randomly
dealt with. For example, the task was not stopped, the throttle value was
still updated and reported in the stats and in the http_server_state header.
It was the same if the server was stopped by the agent or for maintenance.
It's worth noting that the internal function names are still quite confusing.
Till now, the server's state and flags were all saved as a single bit
field. It causes some difficulties because we'd like to have an enum
for the state and separate flags.
This commit starts by splitting them in two distinct fields. The first
one is srv->state (with its counter-part srv->prev_state) which are now
enums, but which still contain bits (SRV_STF_*).
The flags now lie in their own field (srv->flags).
The function srv_is_usable() was updated to use the enum as input, since
it already used to deal only with the state.
Note that currently, the maintenance mode is still in the state for
simplicity, but it must move as well.
It's the final part of the 2 previous patches. We prevent the server from
timing out if we still have some data to pass to it. That way, even if the
server runs with a short timeout and the client with a large one, the server
side timeout will only start to count once the client sends everything. This
ensures we don't report a 504 before the server gets the whole request.
It is not certain whether the 1.4 state machine is fully compatible with
this change. Since the purpose is only to ensure that we never report a
server error before a client error if some data are missing from the client
and when the server-side timeout is smaller than or equal to the client's,
it's probably not worth attempting the backport.
This is the continuation of previous patch "BUG/MEDIUM: http/session:
disable client-side expiration only after body".
This one takes care of properly reporting the client-side read timeout
when waiting for a body from the client. Since the timeout may happen
before or after the server starts to respond, we have to take care of
the situation in three different ways :
- if the server does not read our data fast enough, we emit a 504
if we're waiting for headers, or we simply break the connection
if headers were already received. We report either sH or sD
depending on whether we've seen headers or not.
- if the server has not yet started to respond, but has read all of
the client's data and we're still waiting for more data from the
client, we can safely emit a 408 and abort the request ;
- if the server has already started to respond (thus it's a transfer
timeout during a bidirectional exchange), then we silently break
the connection, and only the session flags will indicate in the
logs that something went wrong with client or server side.
This bug is tagged MEDIUM because it touches very sensible areas, however
its impact is very low. It might be worth performing a careful backport
to 1.4 once it has been confirmed that everything is correct and that it
does not introduce any regression.
For a very long time, back in the v1.3 days, we used to rely on a trick
to avoid expiring the client side while transferring a payload to the
server. The problem was that if a client was able to quickly fill the
buffers, and these buffers took some time to reach the server, the
client should not expire while not sending anything.
In order to cover this situation, the client-side timeout was disabled
once the connection to the server was OK, since it implied that we would
at least expire on the server if required.
But there is a drawback to this : if a client stops uploading data before
the end, its timeout is not enforced and we only expire on the server's
timeout, so the logs report a 504.
Since 1.4, we have message body analysers which ensure that we know whether
all the expected data was received or not (HTTP_MSG_DATA or HTTP_MSG_DONE).
So we can fix this problem by disabling the client-side or server-side
timeout at the end of the transfer for the respective side instead of
having it unconditionally in session.c during all the transfer.
With this, the logs now report the correct side for the timeout. Note that
this patch is not enough, because another issue remains : the HTTP body
forwarders do not abort upon timeout, they simply rely on the generic
handling from session.c. So for now, the session is still aborted when
reaching the server timeout, but the culprit is properly reported. A
subsequent patch will address this specific point.
This bug was tagged MEDIUM because of the changes performed. The issue
it fixes is minor however. After some cooling down, it may be backported
to 1.4.
It was reported by and discussed with Rachel Chavez and Patrick Hemmer
on the mailing list.
On the mailing list, seri0528@naver.com reported an issue when
using balance url_param or balance uri. The request would sometimes
stall forever.
Cyril Bont managed to reproduce it with the configuration below :
listen test :80
mode http
balance url_param q
hash-type consistent
server s demo.1wt.eu:80
and found it appeared with this commit : 80a92c0 ("BUG/MEDIUM: http:
don't start to forward request data before the connect").
The bug is subtle but real. The problem is that the HTTP request
forwarding analyzer refrains from starting to parse the request
body when some LB algorithms might need the body contents, in order
to preserve the data pointer and avoid moving things around during
analysis in case a redispatch is later needed. And in order to detect
that the connection establishes, it watches the response channel's
CF_READ_ATTACHED flag.
The problem is that a request analyzer is not subscribed to a response
channel, so it will only see changes when woken for other (generally
correlated) reasons, such as the fact that part of the request could
be sent. And since the CF_READ_ATTACHED flag is cleared once leaving
process_session(), it is important not to miss it. It simply happens
that sometimes the server starts to respond in a sequence that validates
the connection in the middle of process_session(), that it is detected
after the analysers, and that the newly assigned CF_READ_ATTACHED is
not used to detect that the request analysers need to be called again,
then the flag is lost.
The CF_WAKE_WRITE flag doesn't work either because it's cleared upon
entry into process_session(), ie if we spend more than one call not
connecting.
Thus we need a new flag to tell the connection initiator that we are
specifically interested in being notified about connection establishment.
This new flag is CF_WAKE_CONNECT. It is set by the requester, and is
cleared once the connection succeeds, where CF_WAKE_ONCE is set instead,
causing the request analysers to be scanned again.
For future versions, some better options will have to be considered :
- let all analysers subscribe to both request and response events ;
- let analysers subscribe to stream interface events (reduces number
of useless calls)
- change CF_WAKE_WRITE's semantics to persist across calls to
process_session(), but that is different from validating a
connection establishment (eg: no data sent, or no data to send)
The bug was introduced in 1.5-dev23, no backport is needed.
Till now we used to return a pointer to a rule, but that makes it
complicated to later add support for registering new actions which
may fail. For example, the redirect may fail if the response is too
large to fit into the buffer.
So instead let's return a verdict. But we needed the pointer to the
last rule to get the address of a redirect and to get the realm used
by the auth page. So these pieces of code have moved into the function
and they produce a verdict.
Both use exactly the same mechanism, except for the choice of the
default realm to be emitted when none is selected. It can be achieved
by simply comparing the ruleset with the stats' for now. This achieves
a significant code reduction and further, removes the dependence on
the pointer to the final rule in the caller.
Some of the remaining interleaving of request processing after the
http-request rules can now safely be removed, because all remaining
actions are mutually exclusive.
So we can move together all those related to an intercepting rule,
then proceed with stats, then with req*.
We still keep an issue with stats vs reqrep which forces us to
keep the stats split in two (detection and action). Indeed, from the
beginning, stats are detected before rewriting and not after. But a
reqdeny rule would stop stats, so in practice we have to first detect,
then perform the action. Maybe we'll be able to kill this in version
1.6.
Till now the Connection header was processed in the middle of the http-request
rules and some reqadd rules. It used to force some http-request actions to be
cut in two parts.
Now with keep-alive, not only that doesn't make any sense anymore, but it's
becoming a total mess, especially since we need to know the headers contents
before proceeding with most actions.
The real reason it was not moved earlier is that the "block" or "http-request"
rules can see a different version if some fields are changed there. But that
is already not reliable anymore since the values observed by the frontend
differ from those in the backend.
This patch is the equivalent of commit f118d9f ("REORG: http: move HTTP
Connection response header parsing earlier") but for the request side. It
has been tagged MEDIUM as it could theorically slightly affect some setups
relying on corner cases or invalid setups, though this does not make real
sense and is highly unlikely.
The session's backend request counters were incremented after the block
rules while these rules could increment the session's error counters,
meaning that we could have more errors than requests reported in a stick
table! Commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking L7
information") is the most responsible for this.
This bug is 1.5-specific and does not need any backport.
"block" rules used to build the whole response and forgot to increment
the denied_req counters. By jumping to the general "deny" label created
in previous patch, it's easier to fix this.
The issue was already present in 1.3 and remained unnoticed, in part
because few people use "block" nowadays.
Continue the cleanup of http-request post-processing to remove some
of the interleaved tests. Here we set up a few labels to deal with
the deny and tarpit actions and avoid interleaved ifs.
We still have a plate of spaghetti in the request processing rules.
All http-request rules are executed at once, then some responses are
built interlaced with other rules that used to be there in the past.
Here, reqadd is executed after an http-req redirect rule is *decided*,
but before it is *executed*.
So let's match the doc and config checks, to put the redirect actually
before the reqadd completely.
There's no point in open-coding the sending of 100-continue in
the stats initialization code, better simply rely on the function
designed to process the message body which already does it.
Commit 844a7e7 ("[MEDIUM] http: add support for proxy authentication")
merged in v1.4-rc1 added the ability to emit a status code 407 in auth
responses, but forgot to set the same status in the logs, which still
contain 401.
The bug is harmless, no backport is needed.
Last fix did address the issue for inlined patterns, but it was not
enough because the flags are lost as well when updating patterns
dynamically over the CLI.
Also if the same file was used once with -i and another time without
-i, their references would have been merged and both would have used
the same matching method.
It's appear that the patterns have two types of flags. The first
ones are relative to the pattern matching, and the second are
relative to the pattern storage. The pattern matching flags are
the same for all the patterns of one expression. Now they are
stored in the expression. The storage flags are information
returned by the pattern mathing function. This information is
relative to each entry and is stored in the "struct pattern".
Now, the expression matching flags are forwarded to the parse
and index functions. These flags are stored during the
configuration parsing, and they are used during the parse and
index actions.
This issue was introduced in dev23 with the major pattern rework,
and is a continuation of commit a631fc8 ("BUG/MAJOR: patterns: -i
and -n are ignored for inlined patterns"). No backport is needed.
These flags are only passed to pattern_read_from_file() which
loads the patterns from a file. The functions used to parse the
patterns from the current line do not provide the means to pass
the pattern flags so they're lost.
This issue was introduced in dev23 with the major pattern rework,
and was reported by Graham Morley. No backport is needed.
The latest commit about set-map/add-acl/... causes this warning for
me :
src/proto_http.c: In function 'parse_http_req_cond':
src/proto_http.c:8863: warning: implicit declaration of function 'strndup'
src/proto_http.c:8863: warning: incompatible implicit declaration of built-in function 'strndup'
src/proto_http.c:8890: warning: incompatible implicit declaration of built-in function 'strndup'
src/proto_http.c:8917: warning: incompatible implicit declaration of built-in function 'strndup'
src/proto_http.c:8944: warning: incompatible implicit declaration of built-in function 'strndup'
Use my_strndup() instead of strndup() which is not portable. No backport
needed.
The http_(res|req)_keywords_register() functions allow to register
new keywords.
You need to declare a keyword list:
struct http_req_action_kw_list test_kws = {
.scope = "testscope",
.kw = {
{ "test", parse_test },
{ NULL, NULL },
}
};
and a parsing function:
int parse_test(const char **args, int *cur_arg, struct proxy *px, struct http_req_rule *rule, char **err)
{
rule->action = HTTP_REQ_ACT_CUSTOM_STOP;
rule->action_ptr = action_function;
return 0;
}
http_req_keywords_register(&test_kws);
The HTTP_REQ_ACT_CUSTOM_STOP action stops evaluation of rules after
your rule, HTTP_REQ_ACT_CUSTOM_CONT permits the evaluation of rules
after your rule.
This patch allows manipulation of ACL and MAP content thanks to any
information available in a session: source IP address, HTTP request or
response header, etc...
It's an update "on the fly" of the content of the map/acls. This means
it does not resist to reload or restart of HAProxy.
Commit bed410e ("MAJOR: http: centralize data forwarding in the request path")
has woken up an issue in redirects, where msg->next is not reset when flushing
the input buffer. The result is an attempt to forward a negative amount of
data, making haproxy crash.
This bug does not seem to affect versions prior to dev23, so no backport is
needed.
These ones report a string as "HTTP/1.0" or "HTTP/1.1" depending on the
version of the request message or the response message, respectively.
The purpose is to be able to emit custom log lines reporting this version
in a persistent way.
Currently, the parsing of the HTTP Connection header for the response
is performed at the same place as the rule sets, which means that after
parsing the beginning of the response, we still have no information on
whether the response is keep-alive compatible or not. Let's do that
earlier.
Note that this is the same code that was moved in the previous function,
both of them are always called in a row so no change of behaviour is
expected.
A future change might consist in having a late analyser to perform the
late header changes such as mangling the connection header. It's quite
painful that currently this is mixed with the rest of the processing
such as filters.
This allows the stats page to work in keep-alive mode and to be
compressed. At compression ratios up to 80%, it's quite interesting
for large pages.
We ensure to skip filters because we don't want to unexpectedly block
a response nor to mangle response headers.
All the code inherited from version 1.1 still holds a lot ot sessions
called "t" because in 1.1 they were tasks. This naming is very annoying
and sometimes even confusing, for example in code involving tables.
Let's get rid of this once for all and before 1.5-final.
Nothing changed beyond just carefully renaming these variables.
It's useless to process 100-continue in the middle of response filters
because there's no info in the 100 response itself, and it could even
make things worse. So better use it as it is, an interim response
waiting for the next response, thus we just have to put it into
http_wait_for_response(). That way we ensure to have a valid response
in this function.
Since commit d7ad9f5 ("MAJOR: channel: add a new flag CF_WAKE_WRITE to
notify the task of writes"), we got another bug with 100-continue responses.
If the final response comes in the same packet as the 100, then the rest of
the buffer is not processed since there is no wake-up event.
In fact the change above uncoverred the real culprit which is more
likely session.c which should detect that an earlier analyser was set
and should loop back to it.
A cleaner fix would be better, but setting the flag works fine.
This issue was introduced in 1.5-dev22, no backport is needed.
Patches c623c17 ("MEDIUM: http: start to centralize the forwarding code")
and bed410e ("MAJOR: http: centralize data forwarding in the request path")
merged into 1.5-dev23 cause transfers to be silently aborted after the
server timeout due to the fact that the analysers are woken up when the
timeout strikes and they believe they have nothing more to do, so they're
terminating the transfer.
No backport is needed.
This basically reimplements commit f3221f9 ("MEDIUM: stats: add support
for HTTP keep-alive on the stats page") which was reverted by commit
51437d2 after Igor Chan reported a broken stats page caused by the bug
fix by previous commit.