It happens that openssl's API can differ between versions, causing some
serious trouble if the version used at runtime is not the same as used
for building.
Now we report the two versions separately along with a warning if the
version differs (except the patch version).
Source function will not work with the following line in default section:
source 0.0.0.0 usesrc clientip
even that related settings by iptables and ip rule have been set correctly.
But it can work well in backend setcion.
The reason is that the operation in line 1815 in cfgparse.c as below:
curproxy->conn_src.opts = defproxy.conn_src.opts & ~CO_SRC_TPROXY_MASK;
clears three low bits of conn_src.opts which stores the configuration of
'usesrc'. Without correct bits set, the source address function can not
work well. They should be copied to the backend proxy without being modified.
Since conn_src.tproxy_addr had not copied from defproxy to backend proxy
while initializing backend proxy, source function will not work well
with explicit source address set in default section either.
Signed-off-by: Godbach <nylzhaowei@gmail.com>
Note: the bug was introduced in 1.5-dev16 with commit ef9a3605
In 1.5-dev17 (commit 1facd6d6), we reorganized the way HTTP stats
requests are handled. When moving the code, we dropped a "return 0"
which happens upon incomplete POST request, so we now end up with
the next return 1 which causes processing to go on with next
analyser. This causes incomplete POST requests to try to forward
the request to servers, resulting in either a 404 or a 503 depending
on the configuration.
This patch fixes this regression to restore the previous behaviour.
It's not enough though, as it happens that the stats code is handled
after all http header processing but in the same function. The net
effect is that incomplete requests cause the headers manipulation to
be performed multiple times, possibly resulting in multiple headers
in the request buffer. Since the stats requests are not meant to be
forwarded, it's not an issue yet but this is something to take care
of later.
A remaining issue that's not handled yet is that if the client does
not send the complete POST headers, then the request is finally
forwarded. This is not a regression, it has always been there and
seems to be caused by the lack of timeout processing when waiting
for the POST body. The solution to this issue would be to move the
handling of stats requests into a dedicated analyser placed after
http_process_request_body().
Bug reported by Guillaume de Lafond.
Implements the "res.comp" ACL which is a boolean returning 1 when a
response has been compressed by HAProxy or 0 otherwise.
Implements the "res.comp_algo" fetch which contains the name of the
algorithm HAProxy used to compress the response.
Bryan Talbot reported that a config with only "stats bind-process 1" in
the global section would crash during parsing. This bug was introduced
with this new statement in 1.5-dev13. The stats frontend must be allocated
if it was not yet.
No backport is needed. The workaround consists in having previously declared
any other stats keyword (generally stats socket).
A "soft-stopped" server (weight 0) can be missed easily in the
webinterface. Fix this by using a specific class (and color).
Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
When a server state change happens, a status bar appears with a link. Since
commit 1.5-dev16 (commit e7dbfc66), it was not visible anymore because of
the CSS properties set on the whole "div" tag used by the tooltips. Fix this
by using a specific class for the tooltips.
The confirmation link on the web interface forgets to set the displaying
options, so that when one clicks on "[X] Action processed successfully",
the auto-refresh, up/down and scope search settings are lost. We need to
pass all these info in the new link.
commit 88c278fadf provided a new input field on the statistics page which was
focused by default. The autofocus prevent to scroll down the page with the
keyboard immediately afterloading it, without the need to leave that field.
It also interfered with "stats refresh" by scrolling up to this field on each
refresh.
This small patch removes the autofocus keyword and adds a tabindex="1" as
suggested by Vincent Bernat. It also cleans up the generated html code.
This patch adds a "scope" box in the statistics page in order to
display only proxies with a name that contains the requested value.
The scope filter is preserved across all clicks on the page.
Since ea68d36 we show whether JIT is enabled, based on the USE-flag
(USE_PCRE_JIT). This is too naive; libpcre may be built without JIT
support (which is the default).
Fix this by calling pcre_config(), which has the accurate information
we are looking for.
Example of a libpcre without JIT support after this patch:
> ./haproxy -vv | grep PCRE
> OPTIONS = USE_STATIC_PCRE=1 USE_PCRE_JIT=1
> Built with PCRE version : 8.32 2012-11-30
> PCRE library supports JIT : no (libpcre build without JIT?)
The compression state machine happens to start work it cannot undo if
there's no more data in the input buffer, and has trouble accounting
for it. Fixing it requires more than a few lines, as the confusion is
in part caused by the way the pointers to the various places in the
message are handled internally. So as a temporary fix, let's disable
compression on chunk-encoded responses. This will give us more time
to perform the required changes.
Will Glass-Husain reported this breakage which was caused by commit
dec9814e merged in 1.5-dev12, and which was incomplete, resulting in
both "clear table" and "show table" to do the same thing.
No backport is needed.
Improve error log reporting in the format parser by always giving the
file name, line number, and directive name instead of the hard-coded
"log-format".
Previously we got this when dealing with log-format errors:
[WARNING] 101/183012 (8561) : Warning: log-format variable name 'r' is not suited to HTTP mode
[WARNING] 101/183012 (8561) : log-format: sample fetch <hdr(a,1)> may not be reliably used with 'log-format' because it needs 'HTTP request headers,HTTP response headers' which is not available here.
[WARNING] 101/183012 (8561) : Warning: no such variable name 'k' in log-format
Now we have this :
[WARNING] 101/183016 (8593) : parsing [fmt.cfg:8] : 'log-format' variable name 'r' is reserved for HTTP mode
[WARNING] 101/183016 (8593) : parsing [fmt.cfg:8] : 'log-format' : sample fetch <hdr(a,1)> may not be reliably used here because it needs 'HTTP request headers,HTTP response headers' which is not available here.
[WARNING] 101/183016 (8593) : parsing [fmt.cfg:15] : no such variable name 'k' in 'unique-id-format'
Commit a4312fa2 merged into dev18 improved log-format management by
processing "log-format" and "unique-id-format" where they were declared,
so that the faulty args could be reported with their correct line numbers.
Unfortunately, the log-format parser considers the proxy mode (TCP/HTTP)
and now if the directive is set before the "mode" statement, it can be
rejected and report warnings.
So we really need to parse these directives at the end of a section at
least. Right now we do not have an "end of section" event, so we need
to store the file name and line number for each of these directives,
and take care of them at the end.
One of the benefits is that now the line numbers can be inherited from
the line passing "option httplog" even if it's in a defaults section.
Future improvements should be performed to report line numbers in every
log-format processed by the parser.
When the parameter passed to a consistent hash is not found, we fall back to
round-robin using chash_get_next_server(). This one stores the last visited
server in lbprm.chash.last, which can be NULL upon the first invocation or if
the only server was recently brought up.
The loop used to scan for a server is able to skip the previously attempted
server in case of a redispatch, by passing this previous server in srvtoavoid.
For this reason, the loop stops when the currently considered server is
different from srvtoavoid and different from the original chash.last.
A problem happens in a special sequence : if a connection to a server fails,
then all servers are removed from the farm, then the original server is added
again before the redispatch happens, we have chash.last = NULL and srvtoavoid
set to the only server in the farm. Then this server is always equal to
srvtoavoid and never to NULL, and the loop never stops.
The fix consists in assigning the stop point to the first encountered node if
it was not yet set.
This issue cannot happen with the map-based algorithm since it's based on an
index and not a stop point.
This issue was reported by Henry Qian who kindly provided lots of critically
useful information to figure out the conditions to reproduce the issue.
The fix needs to be backported to 1.4 which is also affected.
Till now we used to call the function until the connection established, which
means that the header rewriting was performed for nothing upon each even (eg:
uploaded contents) until the server responded or timed out.
Now we only call the function when we assign the server.
When syncing data between two haproxy instances, the received keys were
allocated in the stack but no room was planned for type string. So in
practice all strings of 16 or less bytes were properly handled (due to
the union with in6_addr which reserves some room), but above this some
local variables were overwritten. Up to 8 additional bytes could be
written without impact, but random behaviours are expected above, including
crashes and memory corruption.
If large enough strings are allowed in the configuration, it is possible that
code execution could be triggered when the function's return pointer is
overwritten.
A quick workaround consists in ensuring that no stick-table uses a string
larger than 16 bytes (default is 32). Another workaround is to disable peer
sync when strings are in use.
Thanks to Will Glass-Husain for bringing up this issue with a backtrace to
understand the issue.
The bug only affects 1.5-dev3 and above. No backport is needed.
We'll need to centralize some pool_alloc()/pool_free() calls in the
upcoming fix so before that we need to reduce the number of points
by which we leave the critical code.
According to "man pcreapi", pcre_compile() does not accept being passed
a NULL pointer in errptr or erroffset. It immediately returns NULL, causing
any expression to fail. So let's pass real variables and make use of them
to improve error reporting.
When an ACL keyword needs a mandatory argument and this argument is of
type proxy or table, it is allowed not to specify it so that current
proxy is used by default.
In order to achieve this, the ACL expression parser builds a dummy
argument from scratch and marks it unresolved.
However, since recent changes on the ACL and samples, an unresolved
argument needs to be added to the unresolved list. This specific code
did not do it, resulting in random data being used as a proxy pointer
if no argument was passed for a proxy name, possibly even causing a
crash.
A quick workaround consists explicitly naming proxies in ACLs.
Commit 96199b10 reintroduced the splice() mechanism in the new connection
system. However, it failed to account for the number of transferred bytes,
allowing more bytes than scheduled to be transferred to the client. This
can cause an issue with small-chunked responses, where each packet from
the server may contain several chunks, because a single splice() call may
succeed, then try to splice() a second time as the pipe is not full, thus
consuming the next chunk size.
This patch also reverts commit baf2a5 ("OPTIM: splice: detect shutdowns...")
because it introduced a related regression. The issue is that splice() may
return less data than available also if the pipe is full, so having EPOLLRDHUP
after splice() returns less than expected is not a sufficient indication that
the input is empty.
In both cases, the issue may be detected by the presence of "SD" termination
flags in the logs, and worked around by disabling splicing (using "-dS").
This problem was reported by Sander Klein, and no backport is needed.
haproxy -vv shows build informations about USE flags and lib versions.
This patch introduces informations about PCRE and the new JIT feature.
It also makes USE_PCRE_JIT=1 appear in the haproxy -vv "OPTIONS".
This is useful since with the introduction of JIT we will see libpcre
related issues.
Sander Klein reported this bug. The test for the extra argument on these
rules prevent any condition from being added. The bug was introduced with
the feature itself in 1.5-dev16.
Released version 1.5-dev18 with the following main changes :
- DOCS: Add explanation of intermediate certs to crt paramater
- DOC: typo and minor fixes in compression paragraph
- MINOR: config: http-request configuration error message misses new keywords
- DOC: minor typo fix in documentation
- BUG/MEDIUM: ssl: ECDHE ciphers not usable without named curve configured.
- MEDIUM: ssl: add bind-option "strict-sni"
- MEDIUM: ssl: add mapping from SNI to cert file using "crt-list"
- MEDIUM: regex: Use PCRE JIT in acl
- DOC: simplify bind option "interface" explanation
- DOC: tfo: bump required kernel to linux-3.7
- BUILD: add explicit support for TFO with USE_TFO
- MEDIUM: New cli option -Ds for systemd compatibility
- MEDIUM: add haproxy-systemd-wrapper
- MEDIUM: add systemd service
- BUG/MEDIUM: systemd-wrapper: don't leak zombie processes
- BUG/MEDIUM: remove supplementary groups when changing gid
- BUG/MEDIUM: config: fix parser crash with bad bind or server address
- BUG/MINOR: Correct logic in cut_crlf()
- CLEANUP: checks: Make desc argument to set_server_check_status const
- CLEANUP: dumpstats: Make cli_release_handler() static
- MEDIUM: server: Break out set weight processing code
- MEDIUM: server: Allow relative weights greater than 100%
- MEDIUM: server: Tighten up parsing of weight string
- MEDIUM: checks: Add agent health check
- BUG/MEDIUM: ssl: openssl 0.9.8 doesn't open /dev/random before chroot
- BUG/MINOR: time: frequency counters are not totally accurate
- BUG/MINOR: http: don't process abortonclose when request was sent
- BUG/MEDIUM: stream_interface: don't close outgoing connections on shutw()
- BUG/MEDIUM: checks: ignore late resets after valid responses
- DOC: fix bogus recommendation on usage of gpc0 counter
- BUG/MINOR: http-compression: lookup Cache-Control in the response, not the request
- MINOR: signal: don't block SIGPROF by default
- OPTIM: epoll: make use of EPOLLRDHUP
- OPTIM: splice: detect shutdowns and avoid splice() == 0
- OPTIM: splice: assume by default that splice is working correctly
- BUG/MINOR: log: temporary fix for lost SSL info in some situations
- BUG/MEDIUM: peers: only the last peers section was used by tables
- BUG/MEDIUM: config: verbosely reject peers sections with multiple local peers
- BUG/MINOR: epoll: use a fix maxevents argument in epoll_wait()
- BUG/MINOR: config: fix improper check for failed memory alloc in ACL parser
- BUG/MINOR: config: free peer's address when exiting upon parsing error
- BUG/MINOR: config: check the proper variable when parsing log minlvl
- BUG/MEDIUM: checks: ensure the health_status is always within bounds
- BUG/MINOR: cli: show sess should always validate s->listener
- BUG/MINOR: log: improper NULL return check on utoa_pad()
- CLEANUP: http: remove a useless null check
- CLEANUP: tcp/unix: remove useless NULL check in {tcp,unix}_bind_listener()
- BUG/MEDIUM: signal: signal handler does not properly check for signal bounds
- BUG/MEDIUM: tools: off-by-one in quote_arg()
- BUG/MEDIUM: uri_auth: missing NULL check and memory leak on memory shortage
- BUG/MINOR: unix: remove the 'level' field from the ux struct
- CLEANUP: http: don't try to deinitialize http compression if it fails before init
- CLEANUP: config: slowstart is never negative
- CLEANUP: config: maxcompcpuusage is never negative
- BUG/MEDIUM: log: emit '-' for empty fields again
- BUG/MEDIUM: checks: fix a race condition between checks and observe layer7
- BUILD: fix a warning emitted by isblank() on non-c99 compilers
- BUILD: improve the makefile's support for libpcre
- MEDIUM: halog: add support for counting per source address (-ic)
- MEDIUM: tools: make str2sa_range support all address syntaxes
- MEDIUM: config: make use of str2sa_range() instead of str2sa()
- MEDIUM: config: use str2sa_range() to parse server addresses
- MEDIUM: config: use str2sa_range() to parse peers addresses
- MINOR: tests: add a config file to ease address parsing tests.
- MINOR: ssl: add a global tunable for the max SSL/TLS record size
- BUG/MINOR: syscall: fix NR_accept4 system call on sparc/linux
- BUILD/MINOR: syscall: add definition of NR_accept4 for ARM
- MINOR: config: report missing peers section name
- BUG/MEDIUM: tools: fix bad character handling in str2sa_range()
- BUG/MEDIUM: stats: never apply "unix-bind prefix" to the global stats socket
- MINOR: tools: prepare str2sa_range() to return an error message
- BUG/MEDIUM: checks: don't call connect() on unsupported address families
- MINOR: tools: prepare str2sa_range() to accept a prefix
- MEDIUM: tools: make str2sa_range() parse unix addresses too
- MEDIUM: config: make str2listener() use str2sa_range() to parse unix addresses
- MEDIUM: config: use a single str2sa_range() call to parse bind addresses
- MEDIUM: config: use str2sa_range() to parse log addresses
- CLEANUP: tools: remove str2sun() which is not used anymore.
- MEDIUM: config: add complete support for str2sa_range() in dispatch
- MEDIUM: config: add complete support for str2sa_range() in server addr
- MEDIUM: config: add complete support for str2sa_range() in 'server'
- MEDIUM: config: add complete support for str2sa_range() in 'peer'
- MEDIUM: config: add complete support for str2sa_range() in 'source' and 'usesrc'
- CLEANUP: minor cleanup in str2sa_range() and str2ip()
- CLEANUP: config: do not use multiple errmsg at once
- MEDIUM: tools: support specifying explicit address families in str2sa_range()
- MAJOR: listener: support inheriting a listening fd from the parent
- MAJOR: tools: support environment variables in addresses
- BUG/MEDIUM: http: add-header should not emit "-" for empty fields
- BUG/MEDIUM: config: ACL compatibility check on "redirect" was wrong
- BUG/MEDIUM: http: fix another issue caused by http-send-name-header
- DOC: mention the new HTTP 307 and 308 redirect statues
- MEDIUM: poll: do not use FD_* macros anymore
- BUG/MAJOR: ev_select: disable the select() poller if maxsock > FD_SETSIZE
- BUG/MINOR: acl: ssl_fc_{alg,use}_keysize must parse integers, not strings
- BUG/MINOR: acl: ssl_c_used, ssl_fc{,_has_crt,_has_sni} take no pattern
- BUILD: fix usual isdigit() warning on solaris
- BUG/MEDIUM: tools: vsnprintf() is not always reliable on Solaris
- OPTIM: buffer: remove one jump in buffer_count()
- OPTIM: http: improve branching in chunk size parser
- OPTIM: http: optimize the response forward state machine
- BUILD: enable poll() by default in the makefile
- BUILD: add explicit support for Mac OS/X
- BUG/MAJOR: http: use a static storage for sample fetch context
- BUG/MEDIUM: ssl: improve error processing and reporting in ssl_sock_load_cert_list_file()
- BUG/MAJOR: http: fix regression introduced by commit a890d072
- BUG/MAJOR: http: fix regression introduced by commit d655ffe
- BUG/CRITICAL: using HTTP information in tcp-request content may crash the process
- MEDIUM: acl: remove flag ACL_MAY_LOOKUP which is improperly used
- MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
- MINOR: log: indicate it when some unreliable sample fetches are logged
- MEDIUM: samples: move payload-based fetches and ACLs to their own file
- MINOR: backend: rename sample fetch functions and declare the sample keywords
- MINOR: frontend: rename sample fetch functions and declare the sample keywords
- MINOR: listener: rename sample fetch functions and declare the sample keywords
- MEDIUM: http: unify acl and sample fetch functions
- MINOR: session: rename sample fetch functions and declare the sample keywords
- MAJOR: acl: make all ACLs reference the fetch function via a sample.
- MAJOR: acl: remove the arg_mask from the ACL definition and use the sample fetch's
- MAJOR: acl: remove fetch argument validation from the ACL struct
- MINOR: http: add new direction-explicit sample fetches for headers and cookies
- MINOR: payload: add new direction-explicit sample fetches
- CLEANUP: acl: remove ACL hooks which were never used
- MEDIUM: proxy: remove acl_requires and just keep a flag "http_needed"
- MINOR: sample: provide a function to report the name of a sample check point
- MAJOR: acl: convert all ACL requires to SMP use+val instead of ->requires
- CLEANUP: acl: remove unused references to ACL_USE_*
- MINOR: http: replace acl_parse_ver with acl_parse_str
- MEDIUM: acl: move the ->parse, ->match and ->smp fields to acl_expr
- MAJOR: acl: add option -m to change the pattern matching method
- MINOR: acl: remove the use_count in acl keywords
- MEDIUM: acl: have a pointer to the keyword name in acl_expr
- MEDIUM: acl: support using sample fetches directly in ACLs
- MEDIUM: http: remove val_usr() to validate user_lists
- MAJOR: sample: maintain a per-proxy list of the fetch args to resolve
- MINOR: ssl: add support for the "alpn" bind keyword
- MINOR: http: status code 303 is HTTP/1.1 only
- MEDIUM: http: implement redirect 307 and 308
- MINOR: http: status 301 should not be marked non-cacheable
The ALPN extension is meant to replace the now deprecated NPN extension.
This patch implements support for it. It requires a version of openssl
with support for this extension. Patches are available here right now :
http://html5labs.interopbridges.com/media/167447/alpn_patches.zip
While ACL args were resolved after all the config was parsed, it was not the
case with sample fetch args because they're almost everywhere now.
The issue is that ACLs now solely rely on sample fetches, so their args
resolving doesn't work anymore. And many fetches involving a server, a
proxy or a userlist don't work at all.
The real issue is that at the bottom layers we have no information about
proxies, line numbers, even ACLs in order to report understandable errors,
and that at the top layers we have no visibility over the locations where
fetches are referenced (think log node).
After failing multiple unsatisfying solutions attempts, we now have a new
concept of args list. The principle is that every proxy has a list head
which contains a number of indications such as the config keyword, the
context where it's used, the file and line number, etc... and a list of
arguments. This list head is of the same type as the elements, so it
serves as a template for adding new elements. This way, it is filled from
top to bottom by the callers with the information they have (eg: line
numbers, ACL name, ...) and the lower layers just have to duplicate it and
add an element when they face an argument they cannot resolve yet.
Then at the end of the configuration parsing, a loop passes over each
proxy's list and resolves all the args in sequence. And this way there is
all necessary information to report verbose errors.
The first immediate benefit is that for the first time we got very precise
location of issues (arg number in a keyword in its context, ...). Second,
in order to do this we had to parse log-format and unique-id-format a bit
earlier, so that was a great opportunity for doing so when the directives
are encountered (unless it's a default section). This way, the recorded
line numbers for these args are the ones of the place where the log format
is declared, not the end of the file.
Userlists report slightly more information now. They're the only remaining
ones in the ACL resolving function.
Now it becomes possible to directly use sample fetches as the ACL fetch
methods. In this case, the matching method is mandatory. This allows to
form more ACL combinations from existing fetches and will limit the need
for new ACLs when everything is available to form them from sample fetches
and matches.
The acl_expr struct used to hold a pointer to the ACL keyword. But since
we now have all the relevant pointers, we don't need that anymore, we just
need the pointer to the keyword as a string in order to return warnings
and error messages.
So let's change this in order to remove the dependency on the acl_keyword
struct from acl_expr.
During this change, acl_cond_kw_conflicts() used to return a pointer to an
ACL keyword but had to be changed to return a const char* for the same reason.
ACL expressions now support "-m" in addition to "-i" and "-f". This new
option is followed by the name of the pattern matching method to be used
on the extracted pattern. This makes it possible to reuse existing sample
fetch methods with other matching methods (eg: regex). A "found" matching
method ignores any pattern and only verifies that the required sample was
found (useful for cookies).
The HTTP version parser used in ACLs has long been a string and
still had its own parser. This makes no sense, switch it to use
the standard string parser.
The ACLs now use the fetch's ->use and ->val to decide upon compatibility
between the place where they are used and where the information are fetched.
The code is capable of reporting warnings about very fine incompatibilities
between certain fetches and an exact usage location, so it is expected that
some new warnings will be emitted on some existing configurations.
Two degrees of detection are provided :
- detecting ACLs that never match
- detecting keywords that are ignored
All tests show that this seems to work well, though bugs are still possible.
Proxy's acl_requires was a copy of all bits taken from ACLs, but we'll
get rid of ACL flags and only rely on sample fetches soon. The proxy's
acl_requires was only used to allocate an HTTP context when needed, and
was even forced in HTTP mode. So better have a flag which exactly says
what it's supposed to be used for.
These hooks, which established the relation between ACL_USE_* and the location
where the ACL were used, were never used because they were superseded with the
sample capabilities. Remove them now.
Similarly to previous commit fixing "hdr" and "cookie" in HTTP, we have to deal
with "payload" and "payload_lv" which are request-only for ACLs and req/resp for
sample fetches depending on the context, and to a less extent with other req_*
and rep_*/rep_* fetches. So let's add explicit "req." and "res." variants and
make the ACLs rely on that instead.
Since "hdr" and "cookie" were ambiguously referring to the request or response
depending on the context, we need a way to explicitly specify the direction.
By prefixing the fetches names with "req." and "res.", we can now restrict such
fetches to the appropriate direction. At the moment the fetches are explicitly
declared by later we might think about having an automatic match when "req." or
"res." appears. These explicit fetches are now used by the relevant ACLs.
ACL fetch being inherited from the sample fetch keyword, we don't need
anymore to specify what function to use to validate the fetch arguments.
Note that the job is still done in the ACL parsing code based on elements
from the sample fetch structs.
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.
The parsing is still performed by the ACL code though.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.
In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.
A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
The following sample fetch functions were only usable by ACLs but are now
usable by sample fetches too :
cook, cook_cnt, cook_val, hdr_cnt, hdr_ip, hdr_val, http_auth,
http_auth_group, http_first_req, method, req_proto_http, req_ver,
resp_ver, scook, scook_cnt, scook_val, shdr, shdr_cnt, shdr_ip,
shdr_val, status, urlp, urlp_val,
Most of them won't bring much benefit at the moment, or are even aliases of
existing ones, however they'll be needed for ACL->SMP convergence.
A new val_usr() function was added to resolve userlist names into pointers.
The http_auth_group ACL forgot to make its first argument mandatory, so
there was a check in cfgparse to report a vague error. Now that args are
correctly parsed, let's report something more precise.
All urlp* ACLs now support an optional 3rd argument like their sample
counter-part which is the optional delimiter.
The fetch functions have been renamed "smp_fetch_*".
Some args controls on the sample keywords have been relaxed so that we
can soon use them for ACLs :
- cookie now accepts to have an optional name ; it will return the
first matching cookie if the name is not set ;
- same for set-cookie and hdr
The following sample fetch functions were only usable by ACLs but are now
usable by sample fetches too :
dst_conn, so_id,
The fetch functions have been renamed "smp_fetch_*".
The following sample fetch functions were only usable by ACLs but are now
usable by sample fetches too :
fe_conn, fe_id, fe_sess_rate
The fetch functions have been renamed "smp_fetch_*".
The following sample fetch functions were only usable by ACLs but are now
usable by sample fetches too :
avg_queue, be_conn, be_id, be_sess_rate, connslots, nbsrv,
queue, srv_conn, srv_id, srv_is_up, srv_sess_rate
The fetch functions have been renamed "smp_fetch_*".
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.
So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).
As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :
always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end
The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.
The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
If a log-format involves some sample fetches that may not be present at
the logging instant, we can now report a warning.
Note that this is done both for log-format and for add-header and carefully
respects the original fetch keyword's capabilities.
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).
Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).
In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.
So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).
The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.
Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.
Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.
The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
This flag is used on ACL matches that support being looking up patterns
in trees. At the moment, only strings and IPs support tree-based lookups,
but the flag is randomly set also on integers and binary data, and is not
even always set on strings nor IPs.
Better get rid of this mess by only relying on the matching function to
decide whether or not it supports tree-based lookups, this is safer and
easier to maintain.
During normal HTTP request processing, request buffers are realigned if
there are less than global.maxrewrite bytes available after them, in
order to leave enough room for rewriting headers after the request. This
is done in http_wait_for_request().
However, if some HTTP inspection happens during a "tcp-request content"
rule, this realignment is not performed. In theory this is not a problem
because empty buffers are always aligned and TCP inspection happens at
the beginning of a connection. But with HTTP keep-alive, it also happens
at the beginning of each subsequent request. So if a second request was
pipelined by the client before the first one had a chance to be forwarded,
the second request will not be realigned. Then, http_wait_for_request()
will not perform such a realignment either because the request was
already parsed and marked as such. The consequence of this, is that the
rewrite of a sufficient number of such pipelined, unaligned requests may
leave less room past the request been processed than the configured
reserve, which can lead to a buffer overflow if request processing appends
some data past the end of the buffer.
A number of conditions are required for the bug to be triggered :
- HTTP keep-alive must be enabled ;
- HTTP inspection in TCP rules must be used ;
- some request appending rules are needed (reqadd, x-forwarded-for)
- since empty buffers are always realigned, the client must pipeline
enough requests so that the buffer always contains something till
the point where there is no more room for rewriting.
While such a configuration is quite unlikely to be met (which is
confirmed by the bug's lifetime), a few people do use these features
together for very specific usages. And more importantly, writing such
a configuration and the request to attack it is trivial.
A quick workaround consists in forcing keep-alive off by adding
"option httpclose" or "option forceclose" in the frontend. Alternatively,
disabling HTTP-based TCP inspection rules enough if the application
supports it.
At first glance, this bug does not look like it could lead to remote code
execution, as the overflowing part is controlled by the configuration and
not by the user. But some deeper analysis should be performed to confirm
this. And anyway, corrupting the process' memory and crashing it is quite
trivial.
Special thanks go to Yves Lafon from the W3C who reported this bug and
deployed significant efforts to collect the relevant data needed to
understand it in less than one week.
CVE-2013-1912 was assigned to this issue.
Note that 1.4 is also affected so the fix must be backported.
Sander Klein reported that since last snapshot, some downloads would
hang from nginx but succeed from apache. The culprit was not too hard
to find given the low number of recent changes affecting the data path.
Commit d655ffe slightly reorganized the HTTP state machine and
introduced this regression. The reason is that we must never jump
into the MSG_DONE case without first flushing remaining data because
this is not done anymore afterwards. This part is scheduled for
being reorganized since it's totally ugly especially since we added
compression, and this regression is an illustration of its readability.
The issue is entirely dependant on the server close sequence, which
explains why it was reproducible only with nginx here.
This commit fixed a bug and introduced a new one at the same time.
It's a stupid typo, the index to store the context is [0], not [2].
The effect is that parsing the header can loop forever if multiple
headers are found. This issue was reported by Lukas Tribus.
fe61656b added the ability to load a list of certificates from a file,
but error control was incomplete and misleading, as some errors such
as missing files were not reported, and errors reported with Alert()
instead of memprintf() were inappropriate and mixed with upper errors.
Also, the code really supports a single SNI filter right now, so let's
correct it and the doc for that, leaving room for later change if needed.
It designates a list of PEM file with an optional list of SNI filter
per certificate, with the following format for each line :
<crtfile>[ <snifilter>]*
Wildcards are supported in the SNI filter. The certificates will be
presented to clients who provide a valid TLS Server Name Indication
field matching one of SNI filter. If no SNI filter is specified the
CN and alt subjects are used.
Formerly, if A was replaced by B, and then B by C before
A finished exiting, we didn't wait for B to finish so it
ended up as a zombie process.
Fix this by waiting randomly every child we spawn.
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
Baptiste Assmann reported that the cook*() ACLs do not work anymore.
The reason is the way we store the hdr_ctx between subsequent calls
to smp_fetch_cookie() since commit 3740635b (1.5-dev10).
The smp->ctx.a[] storage holds up to 8 pointers. It is not meant for
generic storage. We used to store hdr_ctx in the ctx, but while it used
to just fit for smp_fetch_hdr(), it does not for smp_fetch_cookie()
since we stored it at offset 2.
The correct solution is to use this storage to store a pointer to the
current hdr_ctx struct which is statically allocated.
Seen on Solaris 8, calling vsnprintf() with a null-size results
in the output size not being computed. This causes some random
behaviour including crashes when trying to display error messages
when loading an invalid configuration.
src/standard.c: In function `str2sa_range':
src/standard.c:734: warning: subscript has type `char'
This one was recently introduced by commit c120c8d3.
Some recent glibc updates have added controls on FD_SET/FD_CLR/FD_ISSET
that crash the program if it tries to use a file descriptor larger than
FD_SETSIZE.
For this reason, we now control the compatibility between global.maxsock
and FD_SETSIZE, and refuse to use select() if there too many FDs are
expected to be used. Note that on Solaris, FD_SETSIZE is already forced
to 65536, and that FreeBSD and OpenBSD allow it to be redefined, though
this is not needed thanks to kqueue which is much more efficient.
In practice, since poll() is enabled on all targets, it should not cause
any problem, unless it is explicitly disabled.
This change must be backported to 1.4 because the crashes caused by glibc
have already been reported on this version.
Some recent glibc updates have added controls on FD_SET/FD_CLR/FD_ISSET
that crash the program if it tries to use a file descriptor larger than
FD_SETSIZE.
Do not rely on FD_* macros anymore and replace them with bit fields.
An issue reported by David Coulson is that when using http-send-name-header,
the response processing would randomly be performed. The issue was first
diagnosed by Cyril Bonté as being related to a time race when processing
the closing of the response.
In practice, the issue is a bit trickier. It happens that
http_send_name_header() did not update msg->sol after a rewrite. This
counter is supposed to point to the beginning of the message's body
once headers are scheduled for being forwarded. And not updating it
means that the first forwarding of the request headers in
http_request_forward_body() does not send the correct count, leaving
some bytes in chn->to_forward.
Then if the server sends its response in a single packet with the
close, the stream interface switches to state SI_ST_DIS which in
turn moves to SI_ST_CLO in process_session(), and to close the
outgoing connection. This is detected by http_request_forward_body(),
which then switches the request message to the error state, and syncs
all FSMs and removes any response analyser.
The response analyser being removed, no processing is performed on
the response buffer, which is tunnelled as-is to the client.
Of course, the correct fix consists in having http_send_name_header()
update msg->sol. Normally this ought not to have been needed, but it
is an abuse to modify data already scheduled for being forwarded, so
it is expected that such specific handling has to be done there. Better
not have generic functions deal with such cases, so that it does not
become the standard.
Note: 1.4 does not have this issue even if it does not update the
pointer either, because it forwards from msg->som which is not
updated at the moment the connect() succeeds. So no backport is
required.
The check was made on "cond" instead of "rule->cond", so it never
emitted any warning since either the rule was NULL or it was set to
the last condition met.
This is 1.5-specific and the bug was introduced by commit 4baae248
in 1.5-dev17, so no backport is needed.
Patch 6cbbdbf3 fixed the missing "-" delimitors in logs but it caused
them to be emitted with "http-request add-header", eventhough it was
correctly fixed for the unique-id format. Fix this by simply removing
LOG_OPT_MANDATORY in this case.
Now that all addresses are parsed using str2sa_range(), it becomes easy
to add support for environment variables and use them everywhere an address
is needed. Environment variables are used as $VAR or ${VAR} as in shell.
Any number of variables may compose an address, allowing various fantasies
such as "fd@${FD_HTTP}" or "${LAN_DC1}.1:80".
These ones are usable in logs, bind, servers, peers, stats socket, source,
dispatch, and check address.
Using the address syntax "fd@<num>", a listener may inherit a file
descriptor that the caller process has already bound and passed as
this number. The fd's socket family is detected using getsockname(),
and the usual initialization is performed through the existing code
for that family, but the socket creation is skipped.
Whether the parent has performed the listen() call or not is not
important as this is detected.
For UNIX sockets, we immediately clear the path after preparing a
socket so that we never remove it in case an abort would happen due
to a late error during startup.
This change allows one to force the address family in any address parsed
by str2sa_range() by specifying it as a prefix followed by '@' then the
address. Currently supported address prefixes are 'ipv4@', 'ipv6@', 'unix@'.
This also helps forcing resolving for host names (when getaddrinfo is used),
and force the family of the empty address (eg: 'ipv4@' = 0.0.0.0 while
'ipv6@' = ::).
The main benefits is that unix sockets can now get a local name without
being forced to begin with a slash. This is useful during development as
it is no longer necessary to have stats socket sent to /tmp.
Several of the parsing functions made use of multiple errmsg/err_msg
variables which had to be freed, while there is already one in each
function that is freed upon exit. Adapt the code to use the existing
variable exclusively.
Don't use a statically allocated address both for str2ip and str2sa_range,
use the same. The inet and unix code paths have been splitted a little
better to improve readability.
The 'source' and 'usesrc' statements now completely rely on str2sa_range() to
parse an address. A test is made to ensure that the address family supports
connect().
Now that str2sa_range() knows how to parse UNIX addresses, make str2listener()
use it. It simplifies the function. Next step consists in unifying the error
handling to further simplify the call.
Tests have been done and show that unix sockets are correctly handled, with
and without prefixes, both for global stats and normal "bind" statements.
str2sa_range() now considers that any address beginning with '/' is a UNIX
address. It is compatible with all callers at the moment since all of them
perform this test and use a different parser for such addresses. However,
some parsers (eg: servers) still don't check for unix addresses.
We'll need str2sa_range() to support a prefix for unix sockets. Since
we don't always want to use it (eg: stats socket), let's not take it
unconditionally from global but let the caller pass it.
At the moment, all address families supported on a "server" statement support
a connect() method, but this will soon change with the generalization of
str2sa_range(). Checks currently call ->connect() unconditionally so let's
add a check for this.
The "unix-bind prefix" feature was made for explicit "bind" statements. Since
the stats socket was changed to use str2listener(), it implicitly inherited
from this feature. But both are defined in the global section, and we don't
want them to be position-dependant.
So let's make str2listener() explicitly not apply the unix-bind prefix to the
global stats frontend.
This only affects 1.5-dev so it does not need any backport.
Commit d4448bc8 brought support for parsing port ranges, but invalid
characters are not properly handled and can result in a crash while
parsing the configuration if an invalid character is present in the
port, because the return value is set to NULL then dereferenced.
Add new tunable "tune.ssl.maxrecord".
Over SSL/TLS, the client can decipher the data only once it has received
a full record. With large records, it means that clients might have to
download up to 16kB of data before starting to process them. Limiting the
record size can improve page load times on browsers located over high
latency or low bandwidth networks. It is suggested to find optimal values
which fit into 1 or 2 TCP segments (generally 1448 bytes over Ethernet
with TCP timestamps enabled, or 1460 when timestamps are disabled), keeping
in mind that SSL/TLS add some overhead. Typical values of 1419 and 2859
gave good results during tests. Use "strace -e trace=write" to find the
best value.
This trick was first suggested by Mike Belshe :
http://www.belshe.com/2010/12/17/performance-and-the-tls-record-size/
Then requested again by Ilya Grigorik who provides some hints here :
http://ofps.oreilly.com/titles/9781449344764/_transport_layer_security_tls.html#ch04_00000101
When parsing the config, we now use str2sa_range() to detect when
ranges or port offsets were improperly used. Among the new checks
are "log", "source", "addr", "usesrc" which previously didn't check
for extra parameters.
Right now we have multiple methods for parsing IP addresses in the
configuration. This is quite painful. This patch aims at adapting
str2sa_range() to make it support all formats, so that the callers
perform the appropriate tests on the return values. str2sa() was
changed to simply return str2sa_range().
The output values are now the following ones (taken from the comment
on top of the function).
Converts <str> to a locally allocated struct sockaddr_storage *, and a port
range or offset consisting in two integers that the caller will have to
check to find the relevant input format. The following format are supported :
String format | address | port | low | high
addr | <addr> | 0 | 0 | 0
addr: | <addr> | 0 | 0 | 0
addr:port | <addr> | <port> | <port> | <port>
addr:pl-ph | <addr> | <pl> | <pl> | <ph>
addr:+port | <addr> | <port> | 0 | <port>
addr:-port | <addr> |-<port> | <port> | 0
The detection of a port range or increment by the caller is made by
comparing <low> and <high>. If both are equal, then port 0 means no port
was specified. The caller may pass NULL for <low> and <high> if it is not
interested in retrieving port ranges.
Note that <addr> above may also be :
- empty ("") => family will be AF_INET and address will be INADDR_ANY
- "*" => family will be AF_INET and address will be INADDR_ANY
- "::" => family will be AF_INET6 and address will be IN6ADDR_ANY
- a host name => family and address will depend on host name resolving.
If an address is improperly formated on a bind or server address
and haproxy is built for using getaddrinfo, then a crash may occur
upon the call to freeaddrinfo().
Thanks to Jon Meredith for helping me patch this for SmartOS,
I am not a C/GDB wizard.
Support for server side TFO was actually introduced in linux-3.7,
linux-3.6 just has client support.
This patch fixes documentation and a code comment about the
kernel requirement. It also fixes a wrong tfo related code
comment in src/proto_tcp.c.
Commit a2b9dad introduced use of isblank() which is not present everywhere
and requires -std=c99 or various other defines. Here on gcc-3.4 with glibc
2.3 it emits a warning though it works :
src/checks.c: In function 'event_srv_chk_r':
src/checks.c:1007: warning: implicit declaration of function 'isblank'
This macro matches only 2 values, better replace it with the explicit match.
Support a agent health check performed by opening a TCP socket to a
pre-defined port and reading an ASCII string. The string should have one of
the following forms:
* An ASCII representation of an positive integer percentage.
e.g. "75%"
Values in this format will set the weight proportional to the initial
weight of a server as configured when haproxy starts.
* The string "drain".
This will cause the weight of a server to be set to 0, and thus it will
not accept any new connections other than those that are accepted via
persistence.
* The string "down", optionally followed by a description string.
Mark the server as down and log the description string as the reason.
* The string "stopped", optionally followed by a description string.
This currently has the same behaviour as down (iii).
* The string "fail", optionally followed by a description string.
This currently has the same behaviour as down (iii).
A agent health check may be configured using "option lb-agent-chk".
The use of an alternate check-port, used to obtain agent heath check
information described above as opposed to the port of the service,
may be useful in conjunction with this option.
e.g.
option lb-agent-chk
server http1_1 10.0.0.10:80 check port 10000 weight 100
Signed-off-by: Simon Horman <horms@verge.net.au>
Detect:
* Empty weight string, including no digits before '%' in relative
weight string
* Trailing garbage, including between the last integer and '%'
in relative weights
The motivation for this is to allow the weight string to be safely
logged if successfully parsed by this function
Signed-off-by: Simon Horman <horms@verge.net.au>
Allow relative weights greater than 100%,
capping the absolute value to 256 which is
the largest supported absolute weight.
Signed-off-by: Simon Horman <horms@verge.net.au>
Break out set weight processing code.
This is in preparation for reusing the code.
Also, remove duplicate check in nested if clauses.
{px->lbprm.algo & BE_LB_PROP_DYN) is checked by
the immediate outer if clause, so there is no need
to check it a second time.
Signed-off-by: Simon Horman <horms@verge.net.au>
Currently, to reload haproxy configuration, you have to use "-sf".
There is a problem with this way of doing things. First of all, in the systemd world,
reload commands should be "oneshot" ones, which means they should not be the new main
process but rather a tool which makes a call to it and then exits. With the current approach,
the reload command is the new main command and moreover, it makes the previous one exit.
Systemd only tracks the main program, seeing it ending, it assumes it either finished or failed,
and kills everything remaining as a grabage collector. We then end up with no haproxy running
at all.
This patch adds wrapper around haproxy, no changes at all have been made into it,
so it's not intrusive and doesn't change anything for other hosts. What this wrapper does
is basically launching haproxy as a child, listen to the SIGUSR2 (not to conflict with
haproxy itself) signal, and spawing a new haproxy with "-sf" as a child to relay the
first one.
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
This patch adds a new option "-Ds" which is exactly like "-D", but instead of
forking n times to get n jobs running and then exiting, prefers to wait for all the
children it just created. With this done, haproxy becomes more systemd-compliant,
without changing anything for other systems.
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
When observe layer7 is enabled on a server, a response may cause a server
to be marked down while a check is in progress. When the check finally
completes, the connection is not properly released in process_chk() because
the server states makes it think that no check was in progress due to the
lastly reported failure.
When a new check gets scheduled, it reuses the same connection structure
which is reinitialized. When the server finally closes the previous
connection, epoll_wait() notifies conn_fd_handler() which sees that the
old connection is still referenced by fdtab[fd], but it can not do anything
with this fd which does not match conn->t.sock.fd. So epoll_wait() keeps
reporting this fd forever.
The solution is to always make process_chk() always take care of closing
the connection and not make it rely on the connection layer to so.
Special thanks go to James Cole and Finn Arne Gangstad who encountered
the issue almost at the same time and took care of reporting a very
detailed analysis with rich information to help understand the issue.
Commit 2b0108ad accidently got rid of the ability to emit a "-" for
empty log fields. This can happen for captured request and response
cookies, as well as for fetches. Since we don't want to have this done
for headers however, we set the default log method when parsing the
format. It is still possible to force the desired mode using +M/-M.
Openssl needs to access /dev/urandom to initialize its internal random
number generator. It does so when it needs a random for the first time,
which fails if it is a handshake performed after the chroot(), causing
all SSL incoming connections to fail.
This fix consists in calling RAND_bytes() to produce a random before
the chroot, which will in turn open /dev/urandom before it's too late,
and avoid the issue.
If the random generator fails to work while processing the config,
haproxy now fails with an error instead of causing SSL connections to
fail at runtime.
This new option ensures that there is no possible fallback to a default
certificate if the client does not provide an SNI which is explicitly
handled by a certificate.
In select_compression_response_header(), some tests are rather confusing
as the "fail" label is used to deinitialize the compression context for
the session while it's branched only before initialization succeeds. The
test is always false here and the dereferencing of the comp_algo pointer
which might be null is also confusing. Remove that code which is not needed
anymore since commit ec3e3890 got rid of the latest issues.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
Commit 290e63aa moved the unix parameters out of the global stats socket
to the bind_conf struct. As such the stats admin level was also moved
overthere, but it remained in the stats global section where it was not
used, except by a nasty memcpy() used to initialize the ux struct in the
bind_conf with too large data. Fortunately, the extra data copied were
the previous level over the new level so it did not have any impact, but
it could have been worse.
This bug is 1.5 specific, no backport is needed.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
A test is obviously wrong in uri_auth(). If strdup(pass) returns an error
while strdup(user) passes, the NULL pointer is still stored into the
structure. If the user returns the NULL instead, the allocated memory is
not released before returning the error.
The issue was present in 1.4 so the fix should be backported.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
This function may write the \0 one char too far in the static array.
There is no effect right now as the function has never been used except
maybe in code that was never released. Out-of-tree code might possibly
be affected though (hence the MEDIUM flag).
No backport is needed.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
sig is checked for < 0 or > MAX_SIGNAL, but the signal array is
MAX_SIGNAL in size. At the moment, MAX_SIGNAL is 256. If a system supports
more than MAX_SIGNAL signals, then sending signal MAX_SIGNAL to the process
will corrupt one integer in its memory and might also crash the process.
This bug is also present in 1.4 and 1.3, and the fix must be backported.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
srv cannot be null in http_perform_server_redirect(), as it's taken
from the stream interface's target which is always valid for a
server-based redirect, and it was already dereferenced above, so in
practice, gcc already removes the test anyway.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
utoa_pad() is directly fed into tmplog, which is checked for NULL.
First, when NULLs are possible, they should be put into a temp variable
in order to preserve tmplog, and second, this return value can never be
NULL because the value passed is tv_usec/1000 (between "0" and "999")
with a 4-char output. However better fix the check in case this code gets
improperly copy-pasted for another usage later.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
Currently s->listener is set for all sessions, but this may not remain
the case forever so we already check s->listener for validity. On check
was missed.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
health_adjust() checks for incorrect bounds for the status argument.
With current code, the argument is always a constant from the valid
enum so there is no impact and the check is basically a NOP. However
users running local patches (eg: new checks) might want to recheck
their code.
This fix should be backported to 1.4 which introduced the issue.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
logsrv->minlvl gets the numeric log level from the equivalent string.
Upon error, ->level was checked due to a wrong copy-paste. The effect
is that a wrong name will silently be ignored and due to minlvl=-1,
will act as if the option was not set.
No backport is needed, this is 1.5-specific.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
An error caused by an invalid port does not cause the raddr string to
be freed. This is harmless at the moment since we exit, but may have
an impact later if we ever support hot config changes.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
The wrong variable is checked after a calloc() so a memory shortage would
result in a segfault while loading the config instead of a clean error.
This fix may be backported to 1.4 and 1.3 which are both affected.
Reported-by: Dinko Korunic <dkorunic@reflected.net>
epoll_wait() takes a number of returned events, not the number of
fds to consider. We must not pass it the number of the smallest fd,
as it leads to value zero being used, which is invalid in epoll_wait().
The effect may sometimes be observed with peers sections trying to
connect and causing 2-seconds CPU loops upon a soft reload because
epoll_wait() immediately returns -1 EINVAL instead of waiting for the
timeout to happen.
This fix should be backported to 1.4 too (into ev_epoll and ev_sepoll).
If a peers section contains several instances of the local peer name, only
the first one was considered and the next ones were silently ignored. This
can cause some trouble to debug such a configuration. Now the extra entries
are rejected with an error message indicating where the first occurrence was
found.
Without it, haproxy will retain the group membership of root, which may
give more access than intended to the process. For example, haproxy would
still be in the wheel group on Fedora 18, as seen with :
# haproxy -f /etc/haproxy/haproxy.cfg
# ps a -o pid,user,group,command | grep hapr
3545 haproxy haproxy haproxy -f /etc/haproxy/haproxy.cfg
4356 root root grep --color=auto hapr
# grep Group /proc/3545/status
Groups: 0 1 2 3 4 6 10
# getent group wheel
wheelâŒ10:root,misc
[WT: The issue has been investigated by independent security research team
and realized by itself not being able to allow security exploitation.
Additionally, dropping groups is not allowed to unprivileged users,
though this mode of deployment is quite common. Thus a warning is
emitted in this case to inform the user. The fix could be backported
into all supported versions as the issue has always been there. ]
Due to a typo in the peers section lookup code, the last declared peers
section was used instead of the one matching the requested name. This bug
has been there since the very first commit on peers section (1.5-dev2).
When using log-format to log the result of sample fetch functions
which rely on the transport layer (eg: ssl*), we have no way to tell
the proxy not to release the connection before logs have caught the
necessary information. As a result, it happens that logging SSL fetch
functions sometimes doesn't return anything for example if the server
is not available and the connection is immediately aborted.
This issue will be fixed with the upcoming patches to finish handling
of sample fetches.
So for the moment, always mark the LW_XPRT flag on the proxy so that
when any fetch method is used, the proxy does not release the transport
layer too fast.
Versions of splice between 2.6.25 and 2.6.27.12 were bogus and would return EAGAIN
on incoming shutdowns. On these versions, we have to call recv() after such a return
in order to find whether splice is OK or not. Since 2.6.27.13 we don't need to do
this anymore, saving one useless recv() call after each splice() returning EAGAIN,
and we can avoid this logic by defining ASSUME_SPLICE_WORKS.
Building with linux2628 automatically enables splice and the flag above since the
kernel is safe. People enabling splice for custom kernels will be able to disable
this logic by hand too.
Since last commit introducing EPOLLRDHUP, the splicing code is able to
detect an incoming shutdown without calling splice() == 0. This avoids
one useless syscall.
epoll may report pending shutdowns using EPOLLRDHUP. Since this
flag is missing from a number of libcs despite being available
since kernel 2.6.17, let's define it ourselves.
Doing so saves one syscall by allow us to avoid the read()==0 when
the server closes with the respose.
SIGPROF is blocked then restored to default settings, which sometimes
makes profiling harder. Let's not block it by default nor restore it,
it doesn't serve any purpose anyway.
As stated in both RFC2616 and the http-bis drafts, Cache-Control:
no-transform must be looked up in the response since we're modifying
the response. However, its presence in the request is irrelevant to
any changes in the response :
7.2.1.6. no-transform
The "no-transform" request directive indicates that an intermediary
(whether or not it implements a cache) MUST NOT change the Content-
Encoding, Content-Range or Content-Type request header fields, nor
the request representation.
7.2.2.9. no-transform
The "no-transform" response directive indicates that an intermediary
(regardless of whether it implements a cache) MUST NOT change the
Content-Encoding, Content-Range or Content-Type response header
fields, nor the response representation.
Note: according to the specs, we're supposed to emit the following
response header :
Warning: 214 transformation applied
However no other product seems to do it, so the effect on user agents
is unclear.
Reinout Verkerk from Trilex reported an issue with servers recently
flapping after an haproxy upgrade. Haproxy checks a simple agent
returning an HTTP response. The issue is that if the request packet
is lost but the simple agent responds before reading the HTTP request
and closes, the server will emit a TCP RST once the request finally
reaches it.
The way checks have been ported to use connections makes the error
flag show up as a failure after the success, reporting a stupid case
where the server is said to be down with a correct response.
In order to fix this, let's ignore the connection's error flag if a
successful check has already been reported. Reinout could verify that
a patched server did not exhibit the problem anymore.
Commit 7bb68abb introduced the SI_FL_NOHALF flag in dev10. It is used
to automatically close the write side of a connection whose read side
is closed. But the patch also caused the opposite to happen, which is
that a simple shutw() call would immediately close the connection. This
is not desired because when using option abortonclose, we want to pass
the client's shutdown to the server which will decide what to do with
it. So let's avoid the close when SHUTR is not set.
option abortonclose may cause a valid connection to be aborted just
after the request has been sent. This is because we check for it
during the session establishment sequence before checking for write
activity. So if the abort and the connect complete at the same time,
the abort is still considered. Let's check for an explicity partial
write before aborting.
This fix should be backported to 1.4 too.
When a frontend is rate-limited to 1000 connections per second, the
effective rate measured from the client is 999/s, and connections
experience an average response time of 99.5 ms with a standard
deviation of 2 ms.
The reason for this inaccuracy is that when computing frequency
counters, we use one part of the previous value proportional to the
number of milliseconds remaining in the current second. But even the
last millisecond still uses a part of the past value, which is wrong :
since we have a 1ms resolution, the last millisecond must be dedicated
only to filling the current second.
So we slightly adjust the algorithm to use 999/1000 of the past value
during the first millisecond, and 0/1000 of the past value during the
last millisecond. We also slightly improve the computation by computing
the remaining time instead of the current time in tv_update_date(), so
that we don't have to negate the value in each frequency counter.
Now with the fix, the connection rate measured by both the client and
haproxy is a steady 1000/s, the average response time measured is 99.2ms
and more importantly, the standard deviation has been divided by 3 to
0.6 millisecond.
This fix should also be backported to 1.4 which has the same issue.
The "reqtarpit" rule is not very handy to use. Now that we have more
flexibility with "http-request", let's finally make the tarpit rules
usable there.
There are still semantical differences between apply_filters_to_request()
and http_req_get_intercept_rule() because the former updates the counters
while the latter does not. So we currently have almost similar code leafs
for similar conditions, but this should be cleaned up later.
These are exactly the same as the classic redirect rules except
that they can be interleaved with other http-request rules for
more flexibility.
The redirect parser should probably be changed to stop at the condition
so that the caller puts its own condition pointer. At the moment, the
redirect rule and condition are parsed at once by build_redirect_rule()
and the condition is assigned to the http_req_rule.
We now have http_apply_redirect_rule() which does all the redirect-specific
job instead of having this inside http_process_req_common().
Also one of the benefit gained from uniformizing this code is that both
keep-alive and close response do emit the PR-- flags. The fix for the
flags could probably be backported to 1.4 though it's very minor.
The previous function http_perform_redirect() was becoming confusing
so it was renamed http_perform_server_redirect() since it only applies
to server-based redirection.
Several bugs were introduced recently due to a misunderstanding of how
this function works and what it was supposed to do. Since it's supposed
to only return the pointer to a rule which aborts further processing of
the request, let's rename it to avoid further issues.
The function was also slightly cleaned up without any functional change.
It happens that all of them call parse_logformat_line() which sets
proxy->to_log with a number of flags affecting the line format for
all three users. For example, having a unique-id specified disables
the default log-format since fe->to_log is tested when the session
is established.
Similarly, having "option logasap" will cause "+" to be inserted in
unique-id or headers referencing some of the fields depending on
LW_BYTES.
This patch first removes most of the dependency on fe->to_log whenever
possible. The first possible cleanup is to stop checking fe->to_log
for being null, considering that it always contains at least LW_INIT
when any such usage is made of the log-format!
Also, some checks are wrong. s->logs.logwait cannot be nulled by
"logwait &= ~LW_*" since LW_INIT is always there. This results in
getting the wrong log at the end of a request or session when a
unique-id or add-header is set, because logwait is still not null
but the log-format is not checked.
Further cleanups are required. Most LW_* flags should be removed or at
least replaced with what they really mean (eg: depend on client-side
connection, depend on server-side connection, etc...) and this should
only affect logging, not other mechanisms.
This patch fixes the default log-format and tries to limit interferences
between the log formats, but does not pretend to do more for the moment,
since it's the most visible breakage.
After the response headers are sent and the request processing is done,
the buffers are wiped out and the stream interface is closed. We must
then disable the request analysers, otherwise some processing will
happen on a closed stream interface and empty buffers which do not
match, causing all sort of crashes. This issue was introduced with
recent work on the stats, and was reported by Seri.
David BERARD reported that http-request add-header passes a \0 along
with the header field, which of course is not appropriate. This is
caused by build_logline() which sometimes returns the size with the
trailing zero and sometimes can return an empty string. Let's fix
this function instead of fixing the places where it's used.
Previous commit was still wrong, it broke add-header and set-header
because we don't want to leave on these actions.
The http_check_access_rule() function should be redesigned, it was
initially thought for allow/deny rules but now it is executing other
non-final rules and at the same time returning a pointer to the last
final rule. That becomes a bit confusing and will need to be addressed
before we implement redirect and return.
This commit adding http-request add-header/set-header unfortunately introduced
a regression to the handling of the stats page which is not matched anymore.
Thanks to Dmitry Sivachenko for reporting this.
These two new statements allow to pass information extracted from the request
to the server. It's particularly useful for passing SSL information to the
server, but may be used for various other purposes such as combining headers
together to emulate internal variables.
These macros (U2H, U2A, LIM2A, ...) have been used with an explicit
index for the local storage variable, making it difficult to change
log formats and causing a few issues from time to time. Let's have
a single macro with a rotating index so that up to 10 conversions
may be used in a single call.
Frontend, listener, server and backend detailed counters are now spread
over several lines in a table when the pointer hovers over the <u> area.
The values are much more readble, and the extra space gained this way
allowed to report some percentages.
Note: some incoherencies still exist between some counters. For example,
the backend's cum_conn is increased when a session is assigned instead
of increasing cum_sess.
Using titles to report detailed information is not convenient. The
browser decides to wrap the line where it wants, generally the box
quickly fades away, and it's not possible to copy-paste the text
from the box.
By using two levels, we can make a block appear/disappear depending
on whether its parent it being hovered or not, for example :
.tip { display: none; }
u:hover .tip { display: block; }
or better:
.tip { display: block; visibility: hidden; }
u:hover .tip { visibility: visible; }
Toggling visibility ensures that the place required to display the block
is always reserved. This is important to display boxes that are close to
the edges.
Then using <span class="tip">this is a box</span> will make the text
appear only when the upper <u> is hovered.
But this still adds much text. So instead we use a generic <div> tag
which we don't use anywhere else. That way we don't have to specify a
class :
div { display: block; visibility: hidden; }
u:hover div { visibility: hidden; }
This works pretty well, even in old browsers from 2005.
This commit does not change the display format, it only replaces the
title attribute with the div tag. Later commits will adjust the layout.
At the moment, we need trash chunks almost everywhere and the only
correctly implemented one is in the sample code. Let's move this to
the chunks so that all other places can use this allocator.
Additionally, the get_trash_chunk() function now really returns two
different chunks. Previously it used to always overwrite the same
chunk and point it to a different buffer, which was a bit tricky
because it's not obvious that two consecutive results do alias each
other.
The commit above improved error reporting during log parsing, but as
a result, some shared strings such as httplog_format are truncated
during parsing. This is observable upon startup because the second
proxy to use httplog emits a warning.
Let's have the logformat parser duplicate the string while parsing it.
All the functions that were called "http" or "raw" have been cleaned up
to support either only HTML and use that word in their name, or support
both HTML and CSV and be usable both from the HTTP and the CLI entry
points.
stats_http_dump() now explicitly checks for HTML mode before adding
HTML-specific headers/trailers, and has been renamed since it's now
used by both entry points.
Some more duplicated code could be removed. The patch looks big but it's
mostly due to re-indents resulting from the moves or comments updates.
The calling sequences now look like this :
cli_io_handler()
-> stats_dump_sess_to_buffer() // "show sess"
-> stats_dump_errors_to_buffer() // "show errors"
-> stats_dump_info_to_buffer() // "show info"
-> stats_dump_stat_to_buffer() // "show stat"
-> stats_dump_csv_header()
-> stats_dump_proxy_to_buffer()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
http_stats_io_handler()
-> stats_dump_stat_to_buffer() // same as above, but used for CSV or HTML
-> stats_dump_csv_header() // emits the CSV headers (same as above)
-> stats_dump_html_head() // emits the HTML headers
-> stats_dump_html_info() // emits the equivalent of "show info" at the top
-> stats_dump_proxy_to_buffer() // same as above, valid for CSV and HTML
-> stats_dump_html_px_hdr()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
-> stats_dump_html_px_end()
-> stats_dump_html_end() // emits HTML trailer
The HTTP header injection that are performed in dumpstats when responding
or when redirecting a POST request have nothing to do in dumpstats. They
do not use any state from the stats, and are 100% HTTP. Let's make the
headers there in the HTTP core, and have dumpstats only produce stats.
The dumpstats code looks like a spaghetti plate. Several functions are
supposed to be able to do several things but rely on complex states to
dispatch the work to independant functions. Most of the HTML output is
performed within the switch/case statements of the whole state machine.
Let's clean this up by adding new functions to emit the data and have
a few more iterators to avoid relying on so complex states.
The new stats dump sequence looks like this for CLI and for HTTP :
cli_io_handler()
-> stats_dump_sess_to_buffer() // "show sess"
-> stats_dump_errors_to_buffer() // "show errors"
-> stats_dump_raw_info_to_buffer() // "show info"
-> stats_dump_raw_info()
-> stats_dump_raw_stat_to_buffer() // "show stat"
-> stats_dump_csv_header()
-> stats_dump_proxy()
-> stats_dump_px_hdr()
-> stats_dump_fe_stats()
-> stats_dump_li_stats()
-> stats_dump_sv_stats()
-> stats_dump_be_stats()
-> stats_dump_px_end()
http_stats_io_handler()
-> stats_http_redir()
-> stats_dump_http() // also emits the HTTP headers
-> stats_dump_html_head() // emits the HTML headers
-> stats_dump_csv_header() // emits the CSV headers (same as above)
-> stats_dump_http_info() // note: ignores non-HTML output
-> stats_dump_proxy() // same as above
-> stats_dump_http_end() // emits HTML trailer
Using %[expression] it becomes possible to make the log engine fetch
some samples from the request or the response and provide them in the
logs. Note that this feature is still limited, it does not yet allow
to apply converters, to limit the output length, nor to specify the
direction which should be fetched when a fetch function works in both
directions.
However it's quite convenient to log SSL information or to include some
information that are used in stick tables.
It is worth noting that this has been done in the generic log format
handler, which means that the same information may be used to build the
unique-id header and to pass the information to a backend server.
The log-format parser reached a limit making it hard to add new features.
It also suffers from a weak handling of certain incorrect corner cases,
for example "%{foo}" is emitted as a litteral while syntactically it's an
argument to no variable. Also the argument parser had to redo some of the
job with some cases causing minor memory leaks (eg: ignored args).
This work aims at improving the situation so that slightly better reporting
is possible and that it becomes possible to extend the log format. The code
has a few more states but looks significantly simpler. The parser is now
capable of reporting ignored arguments and truncated lines.
The <type> argument was checked against LOG_FMT_* but it was passed as
LF_* which are two independant enums. It happens that the 3 first entries
in these enums do match, but this broke some experimental changes which
required another state, so let's fix this now.
Some log tokens have evolved in a way that is not completely logical.
For example, frontend tokens sometimes begin with an 'f' and sometimes
with an 'F'. Same for backend and server.
So let's change a few cases without disrupting compatibility with existing
setups :
Bi => bi
Bp => bp
Ci => ci
Cp => cp
Fi => fi
Fp => fp
Si => si
Sp => sp
cc => CC
cs => CS
st => ST
The old ones are still supported but deprecated and will be unsupported by
the 1.5 release. However, a warning message is emitted when they're encounterd
and it indicates what token should be used to replace them.
When log format arguments are specified within braces with no '+' nor '-'
prefix, the NULL string is compared with known keywords causing a crash.
This only happens during parsing so it does not affect runtime processing.
When a server responds prematurely to a POST request, haproxy used to
cause the transfer to be aborted before the end. This is problematic
because this causes the client to receive a TCP reset when it tries to
push more data, generally preventing it from receiving the response
which contain the reason for the premature reponse (eg: "entity too
large" or an authentication request).
From now on we take care of allowing the upload traffic to flow to the
server even when the response has been received, since the server is
supposed to drain it. That way the client receives the server response.
This bug has been present since 1.4 and the fix should probably be
backported there.
The code review during the chase for the POST freeze uncovered another possible
issue which might appear when we perform an incomplete read and want to stop because
of READ_DONTWAIT or because we reached the maximum read_poll limit. Reading is
disabled but SI_FL_WAIT_ROOM was not set, possibly causing some cases where a
send() on the other side would not wake the reader up until another activity
on the same side calls the update function which fixes its status.
Since the changes in connection management, it became necessary to re-enable
polling after a fast-forward transfer would complete.
One such issue was addressed after dev12 by commit 9f7c6a18 (BUG/MAJOR:
stream_interface: certain workloads could cause get stuck) but unfortunately,
it was incomplete as very subtle cases would occasionally remain unaddressed
when a buffer was marked with the NOEXP flag, which is used during POST
uploads. The wake up must be performed even when the flag is there, the
flag is used only to refresh the timeout.
Too many conditions need to be hit together for the situation to be
reproducible, but it systematically appears for some users.
It is particularly important to credit Sander Klein and John Rood from
Picturae ICT ( http://picturae.com/ ) for reporting this bug on the mailing
list, providing configs and countless traces showing the bug in action, and
for their patience testing litteraly tens of snapshots and versions of
supposed fixes during a full week to narrow the commit range until the bug
was really knocked down! As a side effect of their numerous tests, several
other bugs were fixed.
stream_int_chk_rcv_conn() did not clear connection flags before updating them. It
is unsure whether this could have caused the stalled transfers that have been
reported since dev15.
In order to avoid such further issues, we now use a simple inline function to do
all the job.
Back in the days where polling was made with select() where all FDs
were checked at once, stream_int_chk_snd_conn() used to check whether
the file descriptor it was passed was ready or not, so that it did
not perform the work for nothing.
Right now FDs are checked just before calling the I/O handler so this
test never matches at best, or may return false information at worst.
Since conn_fd_handler() always clears the flags upon exit, it looks
like a missed event cannot happen right now. Still, better remove
this outdated check than wait for it to cause issues.
Sander Klein reported a rare case of POST transfers being stalled
after a few megabytes since dev15. One possible culprit is the fix
for the CPU spinning issues which is not totally correct, because
stream_int_chk_snd_conn() would inconditionally enable the
CO_FL_CURR_WR_ENA flag.
What could theorically happen is the following sequence :
1) send buffer is empty, server-side polling is disabled
2) client sends some data
3) such data are forwarded to the server using
stream_int_chk_snd_conn()
4) conn->flags |= CO_FL_CURR_WR_ENA
5) si_conn_send_loop() is called
6) raw_sock_from_buf() does a partial write due to full kernel buffers
7) stream_int_chk_snd_conn() detects this and requests to be called
to send the remaining data using __conn_data_want_send(), and clears
the SI_FL_WAIT_DATA flag on the stream interface, indicating that it
is already congestionned.
8) conn_cond_update_polling() calls conn_data_update_polling() which
sees that both CO_FL_DATA_WR_ENA and CO_FL_CURR_WR_ENA are set, so
it does not enable polling on the output fd.
9) the next chunk from the client fills the buffer
10) stream_int_chk_snd_conn() is called again
11) SI_FL_WAIT_DATA is already cleared, so the function immediately
returns without doing anything.
12) the buffer is now full with the FD write polling disabled and
everything deadlocks.
Not that there is no reason for such an issue not to happen the other
way around, from server to client, except maybe that due to the speed
difference between the client and the server, client-side polling is
always enabled and the buffer is never empty.
All this shows that the new polling still looks fragile, in part due
to the double information on the FD status, being both in fdtab[] and
in the connection, which looks unavoidable. We should probably have
some functions to tighten the relation between such flags and avoid
manipulating them by hand.
Also, the effects of chk_snd() on the polling are still under-estimated,
while the relation between the stream_int and the FD is still too much
present. Maybe the function should be rethought to only call the connection's
fd handler. The connection model probably needs two calling conventions
for bottom half and upper half.
J. Maurice reported that ssllabs.com test affects unrelated
legitimate traffic and cause SSL errors and broken connections.
Sometimes openssl store read/write/handshake errors in a global stack. This
stack is not specific to the current session. Openssl API does not clean the
stack at the beginning of a new read/write. And the function used to retrieve
error ID after read/write, returns the generic error SSL_ERROR_SSL if the
global stack is not empty.
The fix consists in cleaning the errors stack after read/write/handshake
errors.
The two ACL fetches "resp_ver" and "status", if used in a request despite
the warning, would return a match of zero length. This is inappropriate,
better return a non-match to be more consistent with other ACL processing.
When a polled I/O event is detected, the event is added to the updates
list and the I/O handler is called. Upon return, if the event handler
did not experience an EAGAIN, the event remains in the updates list so
that it will be processed later. But if the event was already in the
spec list, its state is updated and it will be called again immediately
upon exit, by fd_process_spec_events(), so this creates unfairness
between speculative events and polled events.
So don't call the I/O handler upon I/O detection when the FD already is
in the spec list. The fd events are still updated so that the spec list
is up to date with the possible I/O change.
The epoll loop checks for newly appeared FDs in order to process them early
if they're accepted sockets. Since the introduction of the fd_ev_set()
calls before the iocb(), the current FD is always in the update list,
and we don't want to check it again, so we must assign the old_updt
index just before calling the I/O handler.
Playing with fdtab[fd].ev makes gcc constantly reload the pointers
because it does not know they don't alias. Use a temporary variable
instead. This saves a few operations in the fast path.
In ev_poll and ev_epoll, we have a bit-to-bit mapping between the POLL_
constants and the FD_POLL_ constants. A comment said that gcc was able
to detect this and to automatically apply a mask. Things have possibly
changed since the output assembly doesn't always reflect this. So let's
perform an explicit assignment when bits are equal.
There were a few synchronous calls to polling updates in some functions
called from the connection handler. These ones are not needed and should
be replaced by more efficient and more debugable asynchronous calls.
Bryan Berry and Baptiste Assmann both reported some occasional CPU
spinning loops where haproxy was still processing I/O but burning
CPU for apparently uncaught events.
What happens is the following sequence :
- proxy is in TCP mode
- a connection from a client initiates a connection to a server
- the connection to the server does not immediately happen and is
polled for
- in the mean time, the client speaks and the stream interface
calls ->chk_snd() on the peer connection to send the new data
- chk_snd() calls send_loop() to send the data. This last one
makes the connection succeed and empties the buffer, so it
disables polling on the connection and on the FD by creating
an update entry.
- before the update is processed, poll() succeeds and reports
a write event for this fd. The poller does fd_ev_set() on the
FD to switch it to speculative mode
- the IO handler is called with a connection which has no write
flag but an FD which is enabled in speculative mode.
- the connection does nothing useful.
- conn_update_polling() at the end of conn_fd_handler() cannot
disable the FD because there were no changes on this FD.
- the handler is left with speculative polling still enabled on
the FD, and will be called over and over until a poll event is
needed to transfer data.
There is no perfectly elegant solution to this. At least we should
update the flags indicating the current polling status to reflect
what is being done at the FD level. This will allow to detect that
the FD needs to be disabled upon exit.
chk_snd() also needs minor changes to correctly switch to speculative
polling before calling send_loop(), and to reflect this in the connection
flags. This is needed so that no event remains stuck there without any
polling. In fact, chk_snd() and chk_rcv() should perform the same number
of preparations and cleanups as conn_fd_handler().
Sample fetch capabilities indicate when the fetch may be used and not
what it requires, so we need to check if a fetch is compatible with
the direction we want and not if it works backwards.
The stick counters were in two distinct sets of struct members,
causing some code to be duplicated. Now we use an array, which
enables some processing to be performed in loops. This allowed
the code to be shrunk by 700 bytes.
This returns the concatenation of the base32 fetch and the src fetch.
The resulting type is of type binary, with a size of 8 or 20 bytes
depending on the source address family. This can be used to track
per-IP, per-URL counters.
This returns a 32-bit hash of the value returned by the "base"
fetch method above. This is useful to track per-URL activity on
high traffic sites without having to store all URLs. Instead a
shorter hash is stored, saving a lot of memory. The output type
is an unsigned integer.
Returns the current amount of concurrent connections tracking the same
tracked counters. This number is automatically incremented when tracking
begins and decremented when tracking stops. It differs from sc1_conn_cur in
that it does not rely on any stored information but on the table's reference
count (the "use" value which is returned by "show table" on the CLI). This
may sometimes be more suited for layer7 tracking.
Until now it was only possible to use track-sc1/sc2 with "src" which
is the IPv4 source address. Now we can use track-sc1/sc2 with any fetch
as well as any transformation type. It works just like the "stick"
directive.
Samples are automatically converted to the correct types for the table.
Only "tcp-request content" rules may use L7 information, and such information
must already be present when the tracking is set up. For example it becomes
possible to track the IP address passed in the X-Forwarded-For header.
HTTP request processing now also considers tracking from backend rules
because we want to be able to update the counters even when the request
was already parsed and tracked.
Some more controls need to be performed (eg: samples do not distinguish
between L4 and L6).
Commit 07115412 (MEDIUM: stick-table: allocate the table key...) broke
conversion of samples to strings for stick tables, because if replaced
char buf[BUFSIZE] with char buf[0] and the string converters use sizeof
on this part. Note that sizeof was wrong as well but at least it used
to work.
Fix this by making use of the len parameter instead of sizeof.
This is just like previous commit, but for the backend this time. All this
code did not need to remain duplicated. These are 500 more bytes shaved off.
The tproxy and source binding code has now be factored out for
servers and backends. A nice effect is that the code now supports
having backends use source port ranges, though the config does not
support it yet. This change has reduced the executable by around
700 bytes.
Both servers and proxies share a common set of parameters for outgoing
connections, and since they're not stored in a similar structure, a lot
of code is duplicated in the connection setup, which is one sensible
area.
Let's first define a common struct for these settings and make use of it.
Next patches will de-duplicate code.
This change also fixes a build breakage that happens when USE_LINUX_TPROXY
is not set but USE_CTTPROXY is set, which seem to be very unlikely
considering that the issue was introduced almost 2 years ago an never
reported.
When connect() fails with EAGAIN or EADDRINUSE, an error message is
sent to logs and uses srv->id to indicate the server name (this is
very old code). Since version 1.4, it is possible to have srv == NULL,
so the message could cause a crash when connect() returns EAGAIN or
EADDRINUSE. However in practice this does not happen because on lack
of source ports, EADDRNOTAVAIL is returned instead, so this code is
never called.
This fix consists in not displaying the server name anymore, and in
adding the test for EADDRNOTAVAIL.
Also, the log level was lowered from LOG_EMERG to LOG_ERR in order
not to spam all consoles when source ports are missing for a given
target.
This fix should be backported to 1.4.
tcp_connect_server() resets all of the connection's flags. This means
that an outgoing connection does not have the ADDR_TO_SET flag
eventhough the address is set.
The first impact is that logging the outgoing address or displaying
it on the CLI while dumping sessions will result in an extra call to
getpeername().
But there is a nastier impact. If such a lookup happens *after* the
first connect() attempt and this one fails, the destination address
is corrupted by the call to getsockname(), and subsequent connection
retries will fail with socket errors.
For now we fix this by making tcp_connect_server() set the flag. But
we'll soon need a function to initialize an outgoing connection with
appropriate address and flags before calling the connect() function.
Commit 2b199c9a attempted to fix all places where the transport layer
is improperly closed, but it missed one place in session_free(). If
SSL ciphers are logged, the close() is delayed post-log and performed
in session_free(). However, conn_xprt_close() only closes the transport
layer but not the file descriptor, resulting in a slow FD leak which is
hardly noticeable until the process cannot accept any new connection.
A workaround consisted in disabling %sslv/%sslc in log-format.
So use conn_full_close() instead of conn_xprt_close() to fix this there
too.
A similar pending issue existed in the close during outgoing connection
failure, though on this side, the transport layer is never tracked at the
moment.
Errors and Hangups are sticky events, which means that once they're
detected, we never clear them, allowing them to be handled later if
needed.
Till now when an error was reported, it used to register a speculative
I/O event for both recv and send. Since the connection had not requested
such events, it was not able to detect a change and did not clear them,
so the events were called in loops until a timeout caused their owner
task to die.
So this patch does two things :
- stop registering spec events when no I/O activity was requested,
so that we don't end up with non-disablable polling state ;
- keep the sticky polling flags (ERR and HUP) when leaving the
connection handler so that an error notification doesn't
magically become a normal recv() or send() report once the
event is converted to a spec event.
It is normally not needed to make the connection handler emit an
error when it detects POLL_ERR because either a registered data
handler will have done it, or the event will be disabled by the
wake() callback.
In raw_sock, we already check for FD_POLL_HUP after a short recv()
to avoid a useless syscall and detect the end of stream. However,
we fail to check for FD_POLL_ERR here, which causes major issues
as some errors might be delivered and ignored if they are delivered
at the same time as a HUP, and there is no data to send to detect
them on the other direction.
Since the connections flags do not have the CO_FL_ERROR flag, the
polling is not disabled on the socket and the pollers immediately
call the conn_fd_handler() again, resulting in CPU spikes for as
long as the timeouts allow them.
Note that this patch alone fixes the issue but a few patches will
follow to strengthen this fragile area.
Big thanks to Bryan Berry who reported the issue with significant
amounts of detailed traces that helped rule out many other initially
suspected causes and to finally reproduce the issue in the lab.
Considering there is no option yet for maxconnrate for servers, I wrote
an ACL to check a backend server session rate which we use to send to an
"overflow" backend to prevent latency responses to our clients (very
sensitive latency requirements).
Benjamin Polidore reported a build issue on Solaris with gcc 4.2.4 where
stdbool is not usable without c99. It only appeared at one location in
dumpstats and is totally useless, let's use the more common and portable
int as everywhere else.
Sessions using client certs are huge (more than 1 kB) and do not fit
in session cache, or require a huge cache.
In this new implementation sshcachesize set a number of available blocks
instead a number of available sessions.
Each block is large enough (128 bytes) to store a simple session (without
client certs).
Huge sessions will take multiple blocks depending on client certificate size.
Note: some unused code for session sync with remote peers was temporarily
removed.
If a client aborts a request with an error (typically a TCP reset), we must
log a 400. Till now we did not set the status nor close the stream interface,
causing the request to attempt to be forwarded and logging a 503.
Should be backported to 1.4 which is affected as well.
When using ca_ignore_err/crt_ignore_err, a connection to an untrusted
server raises an error which is ignored. But the next SSL_read() that
encounters EAGAIN raises the error again, breaking the connection.
Subsequent connections don't have this problem because the session has
been stored and is correctly reused without performing a verify again.
The solution consists in correctly flushing the SSL error stack when
ignoring the crt/ca error.
When the PROXY protocol header is expected and fails, leading to an
abort of the incoming connection, we now emit a log message. If option
dontlognull is set and it was just a port probe, then nothing is logged.
Since the introduction of SSL, it became quite annoying not to get any useful
info in logs about handshake failures. Let's improve reporting for embryonic
sessions by checking a per-connection error code and reporting it into the logs
if an error happens before the session is completely instanciated.
The "dontlognull" option is supported in that if a connection does not talk
before being aborted, nothing will be emitted.
At the moment, only timeouts are considered for SSL and the PROXY protocol,
but next patches will handle more errors.
It's annoying that handshake handlers remove themselves from the
connection flags when they fail because there is no way to tell
which one fails. So now we only remove them when they succeed.
To ensure that we only count when a response was compressed, we also
check for the SN_COMP_READY flag which indicates that the compression
was effectively initialized. Comp_algo alone is meaningless.
Compression was not disabled on 1xx, 204, 304 nor HEAD requests. This
is not really a problem, but it reports more compressed responses than
really done.
Commit 5730c68b changed to display compression ratios based on 2xx
responses, but we should then check that there are such responses
instead of checking for requests. The risk is a divide error if there
are some requests but no 2xx yet (eg: redirect).
Let's only look up the content-type header once. This involves
inverting the condition which is not dramatic.
Also, we now always check the value length before comparing it, and we
always reset the ctx.idx before looking a header up. Otherwise that
could make header lookups depend on their on-wire order. It would be
a minor issue however since at worst it would cause some responses not
to be compressed.
Since only responses with status 200 can be compressed, let's only count the
ratio of compressed responses on the basis of the 2xx responses and not all
of them. Note that responses 206 are still included in this count but it gives
a better figure, especially for places where authentication is used and 401 is
common.
The compression is disabled when the HTTP status code is not 200, indeed
compression on some HTTP code can create issues (ex: 206, 416).
Multipart message should not be compressed eitherway.
If a client aborts with an abortonclose flag, the close is forwarded
to the server and when server response is processed, the analyser thinks
it's the server who has closed first, and logs flags "SD" or "SH" and
counts a server error. In order to avoid this, we now first detect that
the client has closed and log a client abort instead.
This likely is the reason why many people have been observing a small rate
of SD/SH flags without being able to find what the error was.
This fix should probably be backported to 1.4.
This change removes pointers for known types (stream_interface, ...),
adds buffer pointers and sizes, and moves buffer information to their
own line. The output is cleaner with shorter lines and slightly more
lines.
show sess <id> puts a backref into the session it's dumping. If the output
is interrupted, the backref cannot always be removed because it's only done
in the I/O handler. This can randomly corrupt the backref list when the
session closes, because it passes the pointer to the next session which
itself might be watched.
The case is hard to reproduce (hundreds of attempts) but monitoring systems
might encounter it frequently.
Thus we have to add a release handler which does the cleanup even when the
I/O handler is not called.
This issue should also be present in 1.4 so the patch should be backported.
Sometimes when debugging haproxy, it is important to take a full
snapshot of all sessions and their respective states. Till now it
was complicated to do because we had to use scripts and sessions
would vanish between two runs.
Now with this command we have the same output as "show sess $id"
but for all sessions in the table. This is a debugging command only,
it should only be used by developers as it is never guaranteed to
perfectly work !
Commit 9b6700f added "v6only". As suggested by Vincent Bernat, it is
sometimes useful to have the opposite option to force binding to the
two protocols when the system is configured to bind to v6 only by
default. This option does exactly this. v6only still has precedence.
Depending on the content-types and accept-encoding fields, some responses
might or might not be compressed. Let's have a counter of the number of
compressed responses and report it in the stats to help improve compression
usage.
Some cosmetic issues were fixed in the CSV output too (missing commas at the
end).
It's interesting to know the average compression ratio obtained on
frontends and backends without having to compute it by hand, so let's
report it in the HTML stats.
The conn_local_send_proxy() function has to retrieve the local and remote
addresses, but the getpeername() and getsockname() functions may fail until
the connection is established. So now we catch this error and poll for write
when this happens.
If an uncaught CO_FL_ERROR flag on a connection is detected, we
immediately go to the wakeup function. This ensures that even if
an error is asynchronously delivered, we don't risk re-enabling
polling or doing unexpected things in the handshake handlers.
Commit 0ffde2cc in 1.5-dev13 tried to always disable polling on file
descriptors when errors were encountered. Unfortunately it did not
always succeed in doing so because it relied on detecting polling
changes to disable it. Let's use a dedicated conn_stop_polling()
function that is inconditionally called upon error instead.
This managed to stop a busy loop observed when a health check makes
use of the send-proxy protocol and fails before the connection can
be established.
Commit 24db47e0 tried to improve support for delayed ACK upon connect
but it was incomplete, because checks with the proxy protocol would
always enable polling for data receive and there was no way of
distinguishing data polling and delayed ack.
So we add a distinct delack flag to the connect() function so that
the caller decides whether or not to use a delayed ack regardless
of pending data (eg: when send-proxy is in use). Doing so covers all
combinations of { (check with data), (sendproxy), (smart-connect) }.
When leaving, during the deinit() process, prune_acl_expr() is called to
delete all ACL expressions. A bug was introduced with commit 34db1084 that
caused every other expression argument to be skipped, and more annoyingly,
it introduced the risk of scanning past the arg list and crashing or
freezing the old process during a reload.
Credits for finding this issue go to Dmitry Sivachenko who first reported
it, and second did a lot of research to narrow it down to a minimal
configuration.
Since 1.5-dev9, ACLs support multiple args. The changes performed in
acl_find_targets() were bogus as they were not always applied to the
current argument being processed, but sometimes to the first one only.
Fortunately till now, all ACLs which support resolvable arguments have
it in the first place only, so there was no impact.
If some listeners are mistakenly configured with 0 as the maxaccept value,
then we now consider them as limited to one accept() at a time. This will
avoid some issues as fixed by the past commit.
Recent commit 16a214 to move the maxaccept parameter to listeners didn't
set it on the peers' listeners, resulting in the value zero being used
there. This caused a busy loop for each peers section, because no incoming
connection could be accepted.
Thanks to Hervé Commowick for reporting this issue.
Several places got the connection close sequence wrong because it
was not obvious. In practice we always need the same sequence when
aborting, so let's have a common function for this.
Commit a522f801 moved a call to __conn_data_want_recv() just after the
connect() call, which is not 100% correct. First, it does not take errors
into account, eventhough this is harmless. Second, this change will only
be taken into account after next call do conn_data_polling_update(), which
is not necessarily what is expected (eg: if an error is only reported on
the recv side).
So let's use conn_data_poll_recv() instead, which directly subscribes
the event to polling.
Since last commit, some timeouts were converted into an error to report
the status, and as a result, the socket was not closed because it was
supposed to have been done during the wake() call.
Close the socket as soon as the timeout is detected to fix the issue.
Also we now ensure to first initialize the connection flags.
Until now, the check socked was closed in the task which handles the
check, which can sometimes be substantially later when many tasks are
running. It's much cleaner to close() in the wake call, which also
helps removing some FD management from the task itself.
The code is faster and smaller, and fast health checks show a more
predictable behaviour.
Pure TCP checks only use the SYN/ACK in return to a SYN. By forcing
the system to use delayed ACKs, it is possible to send an RST instead
of the ACK and thus ensure that the application will never be needlessly
woken up. This avoids error logs or counters on checked components since
the application is never made aware of this connection which dies in the
network stack.
The process_chk() function still did not consider the the timeout when
it was woken up, so a spurious wakeup could trigger a false timeout. Some
checks were now redundant or could not be triggered (eg: L7 timeout).
So remove them and rearrange the timeout detection.
The porting of checks to using connections was totally bogus. Some checks
were considered successful as soon as the connection was established,
regardless of any response. Some errors would be triggered upon recv
if polling was enabled for send or if the send channel was shut down.
Now the behaviour is much better. It would be cleaner to perform the
fd_delete() in wake_srv_chk() and to process failures and timeouts
separately, but this is already a good start.
Some server check flag names were not properly choosen and cause
analysis trouble, especially the CHK_RUNNING one which does not
mean that a check is running but that the server is running...
Here's the rename :
CHK_RUNNING -> CHK_PASSED
CHK_ERROR -> CHK_FAILED
Some checks which do not induce a close from the server accumulate
local TIME_WAIT sockets because they're cleanly shut down. Typically
TCP probes cause this. This is very problematic when there are many
servers, when the checks are fast or when local source ports are rare.
So now we'll disable lingering on the socket instead of sending a
shutdown. Before doing this we try to drain any possibly pending data.
That way we avoid sending an RST when the server has closed first.
This change means that some servers will see more RSTs, but this is
needed to avoid local source port starvation.
When a check succeeds, it used to only disable receive events while
it should disable both directions. The problem is that if the send
event was reported too, it could re-enable the recv event. In theory
this is not a problem as the task is going to be woken up, but if
there are many tasks in the queue and this task is not processed
immediately, we could theorically face a storm of unprocessed events
(typically POLL_HUP).
So better stop both directions, prevent the send side from enabling
recv and have the process_chk() code enable both directions. This
will also help detecting closes before the check is sent.
Note that all this mess has been inherited from the old code that used
the fd as a flag to report if a check was running. We should have a
dedicated flag and perform the fd_delete() in wake_srv_chk() instead.
Health checks currently still use the connection's fd to know whether
a check is running (this needs to change). When a health check
immediately fails during connect() because of a lack of local resource
(eg: port), we failed to unset the fd, so each time the process_chk
woken up after such an error, it believed a check was still running
and used to close the fd again instead of starting a new check. This
could result in other connections being closed because they were
assigned the same fd value.
The bug is only marked medium because when this happens, the system
is already in a bad state.
A comment was added above tcp_connect_server() to clarify that the
fd is *not* valid on error.
It was a bit frustrating to have no idea about the bandwidth saved by
HTTP compression. Now we have per-frontend and per-backend stats. The
stats on the HTTP interface are shown in a hover title in the "bytes out"
column if at least something was fed to the compressor. 3 new columns
appeared in the CSV stats output.
Some users need more than 64 characters to log large cookies. The limit
was set to 63 characters (and not 64 as previously documented). Now it
is possible to change this using the global "tune.http.cookielen" setting
if required.
The connection handling changed introduced in 1.5-dev12 introduced a
regression with commit 9bf9c14c. The issue is that the stream_sock_read0()
callback must update the channel flags to indicate that the side is closed
so that when process_session() is called, it can propagate the close to the
other side and terminate the session.
The issue only appears in HTTP tunnel mode. It's a bit tricky to trigger
the issue, it requires that the request channel is full with data flowing
from the client to the server and that both the response and the read0()
are received at once so that the flags are not updated, and that the HTTP
analyser switches to tunnel mode without being informed that the request
write side is closed. After that, process_session() does not know that the
connection has to be aborted either, and no more event appears on this side
where the connection stays here forever.
Many thanks to Igor at owind for testing several snapshots and for providing
valuable traces to reproduce and diagnose the issue!
New option 'maxcompcpuusage' in global section.
Sets the maximum CPU usage HAProxy can reach before stopping the
compression for new requests or decreasing the compression level of
current requests. It works like 'maxcomprate' but with the Idle.
Using compression rate limit, the compression level wasn't taking care
of the max compression level during a session because the test was done
on the wrong variable.
Some very specifically scheduled workloads could sometimes get stuck when
data receive was disabled due to buffer full then re-enabled due to a full
send(). A conn_data_want_recv() had to be set again in this specific case.
This bug was introduced with connection rework and polling changes in dev12.
This patch makes changes in the http_response_forward_body state
machine. It checks if the compress algorithm had consumed data before
swapping the temporary and the input buffer. So it prevents null sized
zlib chunks.
Disabling compression based on the content-type was improperly done since the
introduction of the COMP_READY flag, sometimes resulting in truncated responses.
global.tune.maxaccept was used for all listeners. This becomes really not
convenient when some listeners are bound to a single process and other ones
are bound to many processes.
Now we change the principle : we count the number of processes a listener
is bound to, and apply the maxaccept either entirely if there is a single
process, or divided by twice the number of processes in order to maintain
fairness.
The default limit has also been increased from 32 to 64 as it appeared that
on small machines, 32 was too low to achieve high connection rates.
The new "cpu-map" directive allows one to assign the CPU sets that
a process is allowed to bind to. This is useful in combination with
the "nbproc" and "bind-process" directives.
The support is implicit on Linux 2.6.28 and above.
Having nbproc preinitialized to zero is really annoying as it prevents
some checks from being correctly performed. Also the check to prevent
nbproc from being redefined is totally useless, so let's preset it to
1 and remove the test.
There was a possible memory leak in the zlib code when the first response of
a keep-alive session was compressed, because the next request would reset the
compression algo, preventing a later call to session_free() from releasing it.
The reason is that it is necessary to release the assigned resources in
http_end_txn_clean_session().
Zlib (at least 1.2 and 1.3) aborts when it fails to allocate the state, so we
must not count a round on this event. If the state succeeds, then it allocates
all the 4 remaining counters at once.
This is done by passing the default value to SSLCACHESIZE in sessions.
User can use tune.sslcachesize to change this value.
By default, it is set to 20000 sessions as openssl internal cache size.
Currently, a session entry size is between 592 and 616 bytes depending on the arch.
The lookup was broken by commit 050536d5. The server ID is
initialized to a negative value but unfortunately not all the
tests were converted. Thanks to Igor at owind for reporting it.
At least on a heavily patched 2.6.35.9, we can see splice() fail
with EBADF :
recv(6, "789.123456789.123456789.12345678"..., 1049, 0) = 1049
send(5, "HTTP/1.1 200\r\nContent-length: 10"..., 8030, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE) = 8030
gettimeofday({1352717854, 515601}, NULL) = 0
epoll_wait(0x3, 0x40221008, 0x7, 0) = 0
gettimeofday({1352717854, 515793}, NULL) = 0
pipe([7, 8]) = 0
splice(0x6, 0, 0x8, 0, 0xfe12c, 0x3) = -1 EBADF (Bad file descriptor)
close(6) = 0
This clearly is a kernel issue since all FDs are valid here, so let's
simply disable splice() on the connection when this happens so that
the session correctly recovers from that issue using recv().
SSL_do_handshake is not appropriate for reneg, it's only appropriate at the
beginning of a connection. OpenSSL correctly handles renegs using the data
functions, so we use SSL_peek() here to make its state machine progress if
SSL_renegotiate_pending() says a reneg is pending.
SSL may decide to switch to a handshake in the middle of a transfer due to
a reneg. In this case we don't want to re-enable polling because data might
have been left pending in the buffer. We just want to switch immediately to
the handshake mode.
Commit 09f245 came with a bug : if we don't process events from the
spec list that are also being polled, we can end up with some stuck
events that nobody processes.
We must process all events from the spec list even if they're being
polled in parallel.
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.
This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.
In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.
A lot of code was touched by massive replaces, but the changes are not
that important.
Before connections were introduced, it was possible to connect an
external task to a stream interface. However it was left as an
exercise for the brave implementer to find how that ought to be
done.
The feature was broken since the introduction of connections and
was never fixed since due to lack of users. Better remove this dead
code now.
Hijackers were functions designed to inject data into channels in the
distant past. They became unused around 1.3.16, and since there has
not been any user of this mechanism to date, it's uncertain whether
the mechanism still works (and it's not really useful anymore). So
better remove it as well as the pointer it uses in the channel struct.
Some servers are not totally HTTP-compliant when it comes to parsing the
Connection header. This is particularly true with WebSocket where it happens
from time to time that a server doesn't support having a "close" token along
with the "Upgrade" token in the Connection header. This broken behaviour has
also been noticed on some clients though the problem is less frequent on the
response path.
Sometimes the workaround consists in enabling "option http-pretend-keepalive"
to leave the request Connection header untouched, but this is not always the
most convenient solution. This patch introduces a new solution : haproxy now
also looks for the "Upgrade" token in the Connection header and if it finds
it, then it refrains from adding any other token to the Connection header
(though "keep-alive" and "close" may still be removed if found). The same is
done for the response headers.
This way, WebSocket much with less changes even when facing non-compliant
clients or servers. At least it fixes the DISCONNECT issue that was seen
on the websocket.org test.
Note that haproxy does not change its internal mode, it just refrains from
adding new tokens to the connection header.
Now that all pollers make use of speculative I/O, there is no point
having two epoll implementations, so replace epoll with the sepoll code
and remove sepoll which has just become the standard epoll method.
The poller was updated to support speculative events. We'll need this
to fully support SSL.
As an a side effect, the code has become much simpler and much more
efficient, by taking advantage of the nice kqueue API which supports
batched updates. All references to fd_sets have disappeared, and only
the fdtab[].spec_e fields are used to decide about file descriptor
state.
si_fd() is not used a lot, and breaks builds on OpenBSD 5.2 which
defines this name for its own purpose. It's easy enough to remove
this one-liner function, so let's do it.
A failed send() may return ENOTCONN when the connection is not yet established.
On Linux, we generally see EAGAIN but on OpenBSD we clearly have ENOTCONN, so
let's ensure we poll for write when we encounter this error.
ev_sepoll already provides everything needed to manage FD events
by only manipulating the speculative I/O list. Nothing there is
sepoll-specific so move all this to fd.
Compression algorithms are not always supported depending on build options.
"haproxy -vv" now reports if zlib is supported and lists compression algorithms
also supported.
gcc emits this warning while building free_zlib() :
src/compression.c: In function `free_zlib':
src/compression.c:403: warning: 'pool' might be used uninitialized in this function
This is not a bug as the pool cannot take other values, but let's
pre-initialize is to null to fix the warning.
This patch adds input and output rate calcutation on the HTTP compresion
feature.
Compression can be limited with a maximum rate value in kilobytes per
second. The rate is set with the global 'maxcomprate' option. You can
change this value dynamicaly with 'set rate-limit http-compression
global' on the UNIX socket.
This optimisation causes haproxy to time out requests that result
in two TCP packets, one packet containing the header, and one
packet containing the actual data. This is a very typical type
of response from a lot of servers.
[Willy: I suspect the fix might have an impact on the compression code
which I'm not sure completely handles calls with 0 bytes to forward]
CF_READ_DONTWAIT was designed to avoid getting an EAGAIN upon recv() when
very few data are expected. It prevents the reader from looping over
recv(). Unfortunately with speculative I/O, it is very common that the
same event has the time to be called twice before the task handles the
data and disables the recv(). This is because not all tasks are always
processed at once.
Instead of leaving the buffer free-wheeling and doing an EAGAIN, we
disable reading as soon as the first recv() succeeds. This way we're
sure that only the next wakeup of the task will re-enable it if needed.
Doing so has totally removed the EAGAIN we were seeing till now (30% of
recv).
At the moment sepoll is not 100% event-driven, because a call to fd_set()
on an event which is already being polled will not change its state.
This causes issues with OpenSSL because if some I/O processing is interrupted
after clearing the I/O event (eg: read all data from a socket, can't put it
all into the buffer), then there is no way to call the SSL_read() again once
the buffer releases some space.
The only real solution is to go 100% event-driven. The principle is to use
the spec list as an event cache and that each time an I/O event is reported
by epoll_wait(), this event is automatically scheduled for addition to the
spec list for future calls until the consumer explicitly asks for polling
or stopping.
Doing this is a bit tricky because sepoll used to provide a substantial
number of optimizations such as event merging. These optimizations have
been maintained : a dedicated update list is affected when events change,
but not the event list, so that updates may cancel themselves without any
side effect such as displacing events. A specific case was considered for
handling newly created FDs as soon as they are detected from within the
poll loop. This ensures that their read or write operation will always be
attempted as soon as possible, thus reducing the number of poll loops and
process_session wakeups. This is especially true for newly accepted fds
which immediately perform their first recv() call.
Two new flags were added to the fdtab[] struct to tag the fact that a file
descriptor already exists in the update list. One flag indicates that a
file descriptor is new and has just been created (fdtab[].new) and the other
one indicates that a file descriptor is already referenced by the update list
(fdtab[].updated). Even if the FD state changes during operations or if the
fd is closed and replaced, it's not an issue because the update flag remains
and is easily spotted during list walks. The flag must absolutely reflect the
presence of the fd in the update list in order to avoid overflowing the update
list with more events than there are distinct fds.
Note that this change also recovers the small performance loss introduced
by its connection counter-part and goes even beyond.
The CO_FL_WAIT_* flags were not cleared after updating polling flags.
This means that any caller of these functions that did not clear it
would enable polling instead of speculative I/O. This happens during
the stream interface update call which is performed from the session
handler for example.
As of now it's not a problem yet because speculative I/O and polling
are handled the same way. However with upcoming changes it does cause
some deadlocks because enabling read processing on a file descriptor
where everything was already read will do nothing until something new
happens on this FD.
The correct fix consists in clearing the flags while leaving the update
functions.
This fix does not need any backport as it was introduced with recent
connection changes (dev12) and not triggered until last commit.
This is the first step of a series of changes aiming at making the
polling totally event-driven. This first change consists in only
remembering at the connection level whether an FD was enabled or not,
regardless of the fact it was being polled or cached. From now on, an
EAGAIN will always be considered as a change so that the pollers are
able to manage a cache and to flush it based on such events. One of
the noticeable effect is that conn_fd_handler() is called once more
per session (6 instead of 5 min) but other update functions are less
called.
Note that the performance loss caused by this change at the moment is
quite significant, around 2.5%, but the change is needed to have SSL
working correctly in all situations, even when data were read from the
socket and stored in the invisible cache, waiting for some room in the
channel's buffer.
There is a small waste of CPU cycles when no handshake is required on an
accepted connection, because we had to perform one call to conn_fd_handler()
to mark the connection CONNECTED and to call process_session() again to say
that nothing happened.
By marking the connection CONNECTED when there is no pending handshake, we
avoid this extra call to process_session().
Having a global expiration timer for a task means that the tasks are regularly
woken up (at least after each expiration timer). It's totally useless and counter
productive to process the whole session upon each such wakeup, and it's fairly
easy to detect such wakeups, so let's just update the task's timer and return
to sleep when this happens.
For 100k concurrent connections with 10s of timeouts, this can save 10k wakeups
per second, which is not bad.
With the global maxzlibmem option, you are able ton control the maximum
amount of RAM usable for HTTP compression.
A test is done before each zlib allocation, if the there isn't available
memory, the test fail and so the zlib initialization, so data won't be
compressed.