pcre_study() has been around long before JIT has been added. It also seems to
affect the performance in some cases (positive).
Below I've attached some test restults. The test is based on
http://sljit.sourceforge.net/regex_perf.html (see bottom). It has been modified
to just test pcre_study vs. no pcre_study. Note: This test does not try to
match specific header it's instead run over a larger text with more and less
complex patterns to make the differences more clear.
% ./runtest
'mark.txt' loaded. (Length: 19665221 bytes)
-----------------
Regex: 'Twain'
[pcre-nostudy] time: 14 ms (2388 matches)
[pcre-study] time: 21 ms (2388 matches)
-----------------
Regex: '^Twain'
[pcre-nostudy] time: 109 ms (100 matches)
[pcre-study] time: 109 ms (100 matches)
-----------------
Regex: 'Twain$'
[pcre-nostudy] time: 14 ms (127 matches)
[pcre-study] time: 16 ms (127 matches)
-----------------
Regex: 'Huck[a-zA-Z]+|Finn[a-zA-Z]+'
[pcre-nostudy] time: 695 ms (83 matches)
[pcre-study] time: 26 ms (83 matches)
-----------------
Regex: 'a[^x]{20}b'
[pcre-nostudy] time: 90 ms (12495 matches)
[pcre-study] time: 91 ms (12495 matches)
-----------------
Regex: 'Tom|Sawyer|Huckleberry|Finn'
[pcre-nostudy] time: 1236 ms (3015 matches)
[pcre-study] time: 34 ms (3015 matches)
-----------------
Regex: '.{0,3}(Tom|Sawyer|Huckleberry|Finn)'
[pcre-nostudy] time: 5696 ms (3015 matches)
[pcre-study] time: 5655 ms (3015 matches)
-----------------
Regex: '[a-zA-Z]+ing'
[pcre-nostudy] time: 1290 ms (95863 matches)
[pcre-study] time: 1167 ms (95863 matches)
-----------------
Regex: '^[a-zA-Z]{0,4}ing[^a-zA-Z]'
[pcre-nostudy] time: 136 ms (4507 matches)
[pcre-study] time: 134 ms (4507 matches)
-----------------
Regex: '[a-zA-Z]+ing$'
[pcre-nostudy] time: 1334 ms (5360 matches)
[pcre-study] time: 1214 ms (5360 matches)
-----------------
Regex: '^[a-zA-Z ]{5,}$'
[pcre-nostudy] time: 198 ms (26236 matches)
[pcre-study] time: 197 ms (26236 matches)
-----------------
Regex: '^.{16,20}$'
[pcre-nostudy] time: 173 ms (4902 matches)
[pcre-study] time: 175 ms (4902 matches)
-----------------
Regex: '([a-f](.[d-m].){0,2}[h-n]){2}'
[pcre-nostudy] time: 1242 ms (68621 matches)
[pcre-study] time: 690 ms (68621 matches)
-----------------
Regex: '([A-Za-z]awyer|[A-Za-z]inn)[^a-zA-Z]'
[pcre-nostudy] time: 1215 ms (675 matches)
[pcre-study] time: 952 ms (675 matches)
-----------------
Regex: '"[^"]{0,30}[?!\.]"'
[pcre-nostudy] time: 27 ms (5972 matches)
[pcre-study] time: 28 ms (5972 matches)
-----------------
Regex: 'Tom.{10,25}river|river.{10,25}Tom'
[pcre-nostudy] time: 705 ms (2 matches)
[pcre-study] time: 68 ms (2 matches)
In some cases it's more or less the same but when it's faster than by a huge margin.
It always depends on the pattern, the string(s) to match against etc.
Signed-off-by: Christian Ruppert <c.ruppert@babiel.com>
This converter escapes string to use it as json/ascii escaped string.
It can read UTF-8 with differents behavior on errors and encode it in
json/ascii.
json([<input-code>])
Escapes the input string and produces an ASCII ouput string ready to use as a
JSON string. The converter tries to decode the input string according to the
<input-code> parameter. It can be "ascii", "utf8", "utf8s", "utf8"" or
"utf8ps". The "ascii" decoder never fails. The "utf8" decoder detects 3 types
of errors:
- bad UTF-8 sequence (lone continuation byte, bad number of continuation
bytes, ...)
- invalid range (the decoded value is within a UTF-8 prohibited range),
- code overlong (the value is encoded with more bytes than necessary).
The UTF-8 JSON encoding can produce a "too long value" error when the UTF-8
character is greater than 0xffff because the JSON string escape specification
only authorizes 4 hex digits for the value encoding. The UTF-8 decoder exists
in 4 variants designated by a combination of two suffix letters : "p" for
"permissive" and "s" for "silently ignore". The behaviors of the decoders
are :
- "ascii" : never fails ;
- "utf8" : fails on any detected errors ;
- "utf8s" : never fails, but removes characters corresponding to errors ;
- "utf8p" : accepts and fixes the overlong errors, but fails on any other
error ;
- "utf8ps" : never fails, accepts and fixes the overlong errors, but removes
characters corresponding to the other errors.
This converter is particularly useful for building properly escaped JSON for
logging to servers which consume JSON-formated traffic logs.
Example:
capture request header user-agent len 150
capture request header Host len 15
log-format {"ip":"%[src]","user-agent":"%[capture.req.hdr(1),json]"}
Input request from client 127.0.0.1:
GET / HTTP/1.0
User-Agent: Very "Ugly" UA 1/2
Output log:
{"ip":"127.0.0.1","user-agent":"Very \"Ugly\" UA 1\/2"}
A config where a tcp-request rule appears after an http-request rule
might seem valid but it is not. So let's report a warning about this
since this case is hard to detect by the naked eye.
Some users want to add their own data types to stick tables. We don't
want to use a linked list here for performance reasons, so we need to
continue to use an indexed array. This patch allows one to reserve a
compile-time-defined number of extra data types by setting the new
macro STKTABLE_EXTRA_DATA_TYPES to anything greater than zero, keeping
in mind that anything larger will slightly inflate the memory consumed
by stick tables (not per entry though).
Then calling stktable_register_data_store() with the new keyword will
either register a new keyword or fail if the desired entry was already
taken or the keyword already registered.
Note that this patch does not dictate how the data will be used, it only
offers the possibility to create new keywords and have an index to
reference them in the config and in the tables. The caller will not be
able to use stktable_data_cast() and will have to explicitly cast the
stable pointers to the expected types. It can be used for experimentation
as well.
When we were generating a hash, it was done using an unsigned long. When the hash was used
to select a backend, it was sent as an unsigned int. This made it difficult to predict which
backend would be selected.
This patch updates get_hash, and the hash methods to use an unsigned int, to remain consistent
throughout the codebase.
This fix should be backported to 1.5 and probably in part to 1.4.
This value was set in log.h without any #ifndef around, so when one
wanted to change it, a patch was needed. Let's move it to defaults.h
with the usual #ifndef so that it's easier to change it.
The support is all based on static responses. This doesn't add any
request / response logic to HAProxy, but allows a way to update
information through the socket interface.
Currently certificates specified using "crt" or "crt-list" on "bind" lines
are loaded as PEM files.
For each PEM file, haproxy checks for the presence of file at the same path
suffixed by ".ocsp". If such file is found, support for the TLS Certificate
Status Request extension (also known as "OCSP stapling") is automatically
enabled. The content of this file is optional. If not empty, it must contain
a valid OCSP Response in DER format. In order to be valid an OCSP Response
must comply with the following rules: it has to indicate a good status,
it has to be a single response for the certificate of the PEM file, and it
has to be valid at the moment of addition. If these rules are not respected
the OCSP Response is ignored and a warning is emitted. In order to identify
which certificate an OCSP Response applies to, the issuer's certificate is
necessary. If the issuer's certificate is not found in the PEM file, it will
be loaded from a file at the same path as the PEM file suffixed by ".issuer"
if it exists otherwise it will fail with an error.
It is possible to update an OCSP Response from the unix socket using:
set ssl ocsp-response <response>
This command is used to update an OCSP Response for a certificate (see "crt"
on "bind" lines). Same controls are performed as during the initial loading of
the response. The <response> must be passed as a base64 encoded string of the
DER encoded response from the OCSP server.
Example:
openssl ocsp -issuer issuer.pem -cert server.pem \
-host ocsp.issuer.com:80 -respout resp.der
echo "set ssl ocsp-response $(base64 -w 10000 resp.der)" | \
socat stdio /var/run/haproxy.stat
This feature is automatically enabled on openssl 0.9.8h and above.
This work was performed jointly by Dirkjan Bussink of GitHub and
Emeric Brun of HAProxy Technologies.
The pcreposix layer (in the pcre projetc) execute strlen to find
thlength of the string. When we are using the function "regex_exex*2",
the length is used to add a final \0, when pcreposix is executed a
strlen is executed to compute the length.
If we are using a native PCRE api, the length is provided as an
argument, and these operations disappear.
This is useful because PCRE regex are more used than POSIC regex.
This patch remove all references of standard regex in haproxy. The last
remaining references are only in the regex.[ch] files.
In the file src/checks.c, the original function uses a "pmatch" array.
In fact this array is unused. This patch remove it.
This patchs rename the "regex_exec" to "regex_exec2". It add a new
"regex_exec", "regex_exec_match" and "regex_exec_match2" function. This
function can match regex and return array containing matching parts.
Otherwise, this function use the compiled method (JIT or PCRE or POSIX).
JIT require a subject with length. PCREPOSIX and native POSIX regex
require a null terminted subject. The regex_exec* function are splited
in two version. The first version take a null terminated string, but it
execute strlen() on the subject if it is compiled with JIT. The second
version (terminated by "2") take the subject and the length. This
version adds a null character in the subject if it is compiled with
PCREPOSIX or native POSIX functions.
The documentation of posix regex and pcreposix says that the function
returns 0 if the string matche otherwise it returns REG_NOMATCH. The
REG_NOMATCH macro take the value 1 with posix regex and the value 17
with the pcreposix. The documentaion of the native pcre API (used with
JIT) returns a negative number if no match, otherwise, it returns 0 or a
positive number.
This patch fix also the return codes of the regex_exec* functions. Now,
these function returns true if the string match, otherwise it returns
false.
Using the last rate counters, we now compute the queue, connect, response
and total times per server and per backend with a 95% accuracy over the last
1024 samples. The operation is cheap so we don't need to condition it.
qstr() and cstr() will be used to quote-encode strings. The first one
does it unconditionally. The second one is aimed at CSV files where the
quote-encoding is only needed when the field contains a quote or a comma.
This helper is similar to addr_to_str but
tries to convert the port rather than the address
of a struct sockaddr_storage.
This is in preparation for supporting
an external agent check.
Signed-off-by: Simon Horman <horms@verge.net.au>
When no static DH parameters are specified, this patch makes haproxy
use standardized (rfc 2409 / rfc 3526) DH parameters with prime lenghts
of 1024, 2048, 4096 or 8192 bits for DHE key exchange. The size of the
temporary/ephemeral DH key is computed as the minimum of the RSA/DSA server
key size and the value of a new option named tune.ssl.default-dh-param.
Dmitry Sivachenko reported that "uint" doesn't build on FreeBSD 10.
On Linux it's defined in sys/types.h and indicated as "old". Just
get rid of the very few occurrences.
Currently exp_replace() (which is used in reqrep/reqirep) is
vulnerable to a buffer overrun. I have been able to reproduce it using
the attached configuration file and issuing the following command:
wget -O - -S -q http://localhost:8000/`perl -e 'print "a"x4000'`/cookie.php
Str was being checked only in in while (str) and it was possible to
read past that when more than one character was being accessed in the
loop.
WT:
Note that this bug is only marked MEDIUM because configurations
capable of triggering this bug are very unlikely to exist at all due
to the fact that most rewrites consist in static string additions
that largely fit into the reserved area (8kB by default).
This fix should also be backported to 1.4 and possibly even 1.3 since
it seems to have been present since 1.1 or so.
Config:
-------
global
maxconn 500
stats socket /tmp/haproxy.sock mode 600
defaults
timeout client 1000
timeout connect 5000
timeout server 5000
retries 1
option redispatch
listen stats
bind :8080
mode http
stats enable
stats uri /stats
stats show-legends
listen tcp_1
bind :8000
mode http
maxconn 400
balance roundrobin
reqrep ^([^\ :]*)\ /(.*)/(.*)\.php(.*) \1\ /\3.php?arg=\2\2\2\2\2\2\2\2\2\2\2\2\2\4
server srv1 127.0.0.1:9000 check port 9000 inter 1000 fall 1
server srv2 127.0.0.1:9001 check port 9001 inter 1000 fall 1
git.1wt.eu is painfully slow and some people experience issues with
it. Better hide it and only advertise git.haproxy.org which is mirrored
on a faster server.
Also replace haproxy.1wt.eu with www.haproxy.org in the download URL
which appears in the stats page.
The is_addr() function indicates if an address is set and is an IPv4
or IPv6 address. Let's rename it is_inet_addr() and make is_addr()
also accept AF_UNIX addresses.
Some consistency checks cannot be performed between frontends, backends
and peers at the moment because there is no way to check for intersection
between processes bound to some processes when the number of processes is
higher than the number of bits in a word.
So first, let's limit the number of processes to the machine's word size.
This means nbproc will be limited to 32 on 32-bit machines and 64 on 64-bit
machines. This is far more than enough considering that configs rarely go
above 16 processes due to scalability and management issues, so 32 or 64
should be fine.
This way we'll ensure we can always build a mask of all the processes a
section is bound to.
Trying to build with an old gcc and glibc revealed that we must not
state "inline" in our _syscall* definitions since it's already present
in the declaration making use of the _syscall* macros.
The cfgparse.c file becomes huge, and a large part of it comes from the
server keyword parser. Since the configuration is a bit more modular now,
move this parser to server.c.
This patch also moves the check of the "server" keyword earlier in the
supported keywords list, resulting in a slightly faster config parsing
for configs with large numbers of servers (about 10%).
No functional change was made, only the code was moved.
This patch permit to register new sections in the haproxy's
configuration file. This run like all the "keyword" registration, it is
used during the haproxy initialization, typically with the
"__attribute__((constructor))" functions.
The function url2sa() converts faster url like http://<ip>:<port> in a
struct sockaddr_storage. This patch add:
- the https support
- permit to return the length parsed
- support IPv6
- support DNS synchronous resolution only during start of haproxy.
The faster IPv4 convertion way is keeped. IPv6 is slower, because I use
the standard IPv6 parser function.
The function str2net runs DNS resolution if valid ip cannot be parsed.
The DNS function used is the standard function of the libc and it
performs asynchronous request.
The asynchronous request is not compatible with the haproxy
archictecture.
str2net() is used during the runtime throught the "socket".
This patch remove the DNS resolution during the runtime.
The pointer <regstr> is only used to compare and identify the original
regex string with the patterns. Now the patterns have a reference map
containing this original string. It is useless to store this value two
times.
The goal of these patch is to simplify the prototype of
"pat_pattern_*()" functions. I want to replace the argument "char
**args" by a simple "char *arg" and remove the "opaque" argument.
"pat_parse_int()" and "pat_parse_dotted_ver()" are the unique pattern
parser using the "opaque" argument and using more than one string
argument of the char **args. These specificities are only used with ACL.
Other systems using this pattern parser (MAP and CLI) just use one
string for describing a range.
This two functions can read a range, but the min and the max must y
specified. This patch extends the syntax to describe a range with
implicit min and max. This is used for operators like "lt", "le", "gt",
and "ge". the syntax is the following:
":x" -> no min to "x"
"x:" -> "x" to no max
This patch moves the parsing of the comparison operator from the
functions "pat_parse_int()" and "pat_parse_dotted_ver()" to the acl
parser. The acl parser read the operator and the values and build a
volatile string readable by the functions "pat_parse_int()" and
"pat_parse_dotted_ver()". The transformation is done with these rules:
If the parser is "pat_parse_int()":
"eq x" -> "x"
"le x" -> ":x"
"lt x" -> ":y" (with y = x - 1)
"ge x" -> "x:"
"gt x" -> "y:" (with y = x + 1)
If the parser is "pat_parse_dotted_ver()":
"eq x.y" -> "x.y"
"le x.y" -> ":x.y"
"lt x.y" -> ":w.z" (with w.z = x.y - 1)
"ge x.y" -> "x.y:"
"gt x.y" -> "w.z:" (with w.z = x.y + 1)
Note that, if "y" is not present, assume that is "0".
Now "pat_parse_int()" and "pat_parse_dotted_ver()" accept only one
pattern and the variable "opaque" is no longer used. The prototype of
the pattern parsers can be changed.
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.
The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.
Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.
A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
Very often we want to associate one or two flags to a pointer, to
put a type on it or whatever. This patch provides this in standard.h
in the form of a few inline functions which combine a void * pointer
with an int and return an unsigned long called a composite address.
The functions allow to individuall set, retrieve both the pointer and
the flags. This is very similar to what is used in ebtree in fact.
One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking
L7 information") brought support for tracking L7 information in tcp-request
content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters:
correctly unbind the counters tracked by the backend") used to flush the
backend counters after processing a request.
While that earliest patch was correct at the time, it became wrong after
the second patch was merged. The code does what it says, but the concept
is flawed. "TCP request content" rules are evaluated for each HTTP request
over a single connection. So if such a rule in the frontend decides to
track any L7 information or to track L4 information when an L7 condition
matches, then it is applied to all requests over the same connection even
if they don't match. This means that a rule such as :
tcp-request content track-sc0 src if { path /index.html }
will count one request for index.html, and another one for each of the
objects present on this page that are fetched over the same connection
which sent the initial matching request.
Worse, it is possible to make the code do stupid things by using multiple
counters:
tcp-request content track-sc0 src if { path /foo }
tcp-request content track-sc1 src if { path /bar }
Just sending two requests first, one with /foo, one with /bar, shows
twice the number of requests for all subsequent requests. Just because
both of them persist after the end of the request.
So the decision to flush backend-tracked counters was not the correct
one. In practice, what is important is to flush countent-based rules
since they are the ones evaluated for each request.
Doing so requires new flags in the session however, to keep track of
which stick-counter was tracked by what ruleset. A later change might
make this easier to maintain over time.
This bug is 1.5-specific, no backport to stable is needed.
show pools
Dump the status of internal memory pools. This is useful to track memory
usage when suspecting a memory leak for example. It does exactly the same
as the SIGQUIT when running in foreground except that it does not flush
the pools.
Recent commit 4448925 ("BUILD/MINOR: listener: remove a glibc warning on accept4()")
broke accept4() on some systems because the glibc's version may now conflict with
the local one.
It's becoming increasingly difficult to ignore unwanted function returns in
debug code with gcc. Now even when you try to work around it, it suggests a
way to write your code differently. For example :
src/frontend.c:187:65: warning: if statement has empty body [-Wempty-body]
if (write(1, trash.str, trash.len) < 0) /* shut gcc warning */;
^
src/frontend.c:187:65: note: put the semicolon on a separate line to silence this warning
1 warning generated.
This is totally unacceptable, this code already had to be written this way
to shut it up in earlier versions. And now it comments the form ? What's the
purpose of the C language if you can't write anymore the code that does what
you want ?
Emeric proposed to just keep a global variable to drain such useless results
so that gcc stops complaining all the time it believes people who write code
are monkeys. The solution is acceptable because the useless assignment is done
only in debug code so it will not impact performance. This patch implements
this, until gcc becomes even "smarter" to detect that we tried to cheat.
Some systems use different types for tv_sec/tv_usec, some are
signed others not. From time to time new warnings are reported
about implicit casts being done.
This patch ensures that TV_ETERNITY is cast to the appropriate
type in assignments and conversions.
We currently use such an hex parser in pat_parse_bin() to parse hex
string patterns. We'll need another generic one so let's move it to
standard.c and have pat_parse_bin() make use of it.
The inet_pton function needs an input string with a final \0. This
function copies the input string to a temporary buffer, adds the final
\0 and converts to address.
This is achieved by moving rise and fall from struct server to struct check.
After this move the behaviour of the primary check, server->check is
unchanged. However, the secondary agent check, server->agent now has
independent rise and fall values each of which are set to 1.
The result is that receiving "fail", "stopped" or "down" just once from the
agent will mark the server as down. And receiving a weight just once will
allow the server to be marked up if its primary check is in good health.
This opens up the scope to allow the rise and fall values of the agent
check to be configurable, however this has not been implemented at this
stage.
Signed-off-by: Simon Horman <horms@verge.net.au>
This function was designed for haproxy while testing other functions
in the past. Initially it was not planned to be used given the not
very interesting numbers it showed on real URL data : it is not as
smooth as the other ones. But later tests showed that the other ones
are extremely sensible to the server count and the type of input data,
especially DJB2 which must not be used on numeric input. So in fact
this function is still a generally average performer and it can make
sense to merge it in the end, as it can provide an alternative to
sdbm+avalanche or djb2+avalanche for consistent hashing or when hashing
on numeric data such as a source IP address or a visitor identifier in
a URL parameter.
Summary:
In testing at tumblr, we found that using djb2 hashing instead of the
default sdbm hashing resulted is better workload distribution to our backends.
This commit implements a change, that allows the user to specify the hash
function they want to use. It does not limit itself to consistent hashing
scenarios.
The supported hash functions are sdbm (default), and djb2.
For a discussion of the feature and analysis, see mailing list thread
"Consistent hashing alternative to sdbm" :
http://marc.info/?l=haproxy&m=138213693909219
Note: This change does NOT make changes to new features, for instance,
applying an avalance hashing always being performed before applying
consistent hashing.
If haproxy is compiled with the USE_PCRE_JIT option, the length of the
string is used. If it is compiled without this option the function doesn't
use the length and expects a null terminated string.
The prototype of the function is ambiguous, and depends on the
compilation option. The developer can think that the length is always
used, and many bugs can be created.
This patch makes sure that the length is used. The regex_exec function
adds the final '\0' if it is needed.
The current file "regex.h" define an abstraction for the regex. It
provides the same struct name and the same "regexec" function for the
3 regex types supported: standard libc, basic pcre and jit pcre.
The regex compilation function is not provided by this file. If the
developper wants to use regex, he must write regex compilation code
containing "#define *JIT*".
This patch provides a unique regex compilation function according to
the compilation options.
In addition, the "regex.h" file checks the presence of the "#define
PCRE_CONFIG_JIT" when "USE_PCRE_JIT" is enabled. If this flag is not
present, the pcre lib doesn't support JIT and "#error" is emitted.
The "set table" statement allows to create new entries with their respective
values. Till now it was limited to a single data type per line, requiring as
many "set table" statements as the desired data types to be set. Since this
is only a parser limitation, this patch gets rid of it. It also allows the
creation of a key with no data types (all reset to their default values).
In preparation of more flexibility in the stick counters, make their
number configurable. It still defaults to 3 which is the minimum
accepted value. Changing the value alone is not sufficient to get
more counters, some bitfields still need to be updated and the TCP
actions need to be updated as well, but this update tries to be
easier, which is nice for experimentation purposes.
As per RFC3260 #4 and BCP37 #4.2 and #5.2, the IPv6 counterpart of TOS
is "traffic class".
Add support for IPv6 traffic class in "set-tos" by moving the "set-tos"
related code to the new inline function inet_set_tos(), handling IPv4
(IP_TOS), IPv6 (IPV6_TCLASS) and IPv4-mapped sockets (IP_TOS, like
::ffff:127.0.0.1).
Also define - if missing - the IN6_IS_ADDR_V4MAPPED() macro in
include/common/compat.h for compatibility.
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.
The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.
The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.
One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.
A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :
19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257
After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.
The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.
In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :
19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257
This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
FreeBSD uses (IPPROTO_IP, IP_BINDANY) and (IPPROTO_IPV6, IPV6_BINDANY)
to enable transparent proxy on a socket.
This patch adds support for the relevant setsockopt() calls.
This patch does not change the logic of the code, it only changes the
way OS-specific defines are tested.
At the moment the transparent proxy code heavily depends on Linux-specific
defines. This first patch introduces a new define "CONFIG_HAP_TRANSPARENT"
which is set every time the defines used by transparent proxy are present.
This also means that with an up-to-date libc, it should not be necessary
anymore to force CONFIG_HAP_LINUX_TPROXY during the build, as the flags
will automatically be detected.
The CTTPROXY flags still remain separate because this older API doesn't
work the same way.
A new line has been added in the version output for haproxy -vv to indicate
what transparent proxy support is available.
When freeing ACL regex, we don't want to perform the free() in regex_free()
as it's already performed in free_pattern(). The double free only happens
when using PCRE_JIT when freeing everything during exit so it's harmless
but exhibits libc errors during a reload/restart.
Bug reported by Seri.
This patch adds a "scope" box in the statistics page in order to
display only proxies with a name that contains the requested value.
The scope filter is preserved across all clicks on the page.
TCP Fast Open is supported in server mode since Linux 3.7, but current
libc's don't define TCP_FASTOPEN=23. Introduce the new USE flag USE_TFO
to define it manually in compat.h. Also note this in the TFO related
documentation.
Now that all addresses are parsed using str2sa_range(), it becomes easy
to add support for environment variables and use them everywhere an address
is needed. Environment variables are used as $VAR or ${VAR} as in shell.
Any number of variables may compose an address, allowing various fantasies
such as "fd@${FD_HTTP}" or "${LAN_DC1}.1:80".
These ones are usable in logs, bind, servers, peers, stats socket, source,
dispatch, and check address.
This change allows one to force the address family in any address parsed
by str2sa_range() by specifying it as a prefix followed by '@' then the
address. Currently supported address prefixes are 'ipv4@', 'ipv6@', 'unix@'.
This also helps forcing resolving for host names (when getaddrinfo is used),
and force the family of the empty address (eg: 'ipv4@' = 0.0.0.0 while
'ipv6@' = ::).
The main benefits is that unix sockets can now get a local name without
being forced to begin with a slash. This is useful during development as
it is no longer necessary to have stats socket sent to /tmp.
Don't use a statically allocated address both for str2ip and str2sa_range,
use the same. The inet and unix code paths have been splitted a little
better to improve readability.
We'll need str2sa_range() to support a prefix for unix sockets. Since
we don't always want to use it (eg: stats socket), let's not take it
unconditionally from global but let the caller pass it.
An invalid copy-paste called it NR_splice instead of NR_accept4.
This does not lead to real issues because if this define is used,
then the code cannot compile since NR_accept4 is still missing.
Right now we have multiple methods for parsing IP addresses in the
configuration. This is quite painful. This patch aims at adapting
str2sa_range() to make it support all formats, so that the callers
perform the appropriate tests on the return values. str2sa() was
changed to simply return str2sa_range().
The output values are now the following ones (taken from the comment
on top of the function).
Converts <str> to a locally allocated struct sockaddr_storage *, and a port
range or offset consisting in two integers that the caller will have to
check to find the relevant input format. The following format are supported :
String format | address | port | low | high
addr | <addr> | 0 | 0 | 0
addr: | <addr> | 0 | 0 | 0
addr:port | <addr> | <port> | <port> | <port>
addr:pl-ph | <addr> | <pl> | <pl> | <ph>
addr:+port | <addr> | <port> | 0 | <port>
addr:-port | <addr> |-<port> | <port> | 0
The detection of a port range or increment by the caller is made by
comparing <low> and <high>. If both are equal, then port 0 means no port
was specified. The caller may pass NULL for <low> and <high> if it is not
interested in retrieving port ranges.
Note that <addr> above may also be :
- empty ("") => family will be AF_INET and address will be INADDR_ANY
- "*" => family will be AF_INET and address will be INADDR_ANY
- "::" => family will be AF_INET6 and address will be IN6ADDR_ANY
- a host name => family and address will depend on host name resolving.
This corrects what appears to be logic errors in cut_crlf().
I assume that the intention of this function is to truncate a
string at the first cr or lf. However, currently lf are ignored.
Also use '\0' instead of 0 as the null character, a cosmetic change.
Cc: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Simon Horman <horms@verge.net.au>
[WT: this fix may be backported to 1.4 too]
When a frontend is rate-limited to 1000 connections per second, the
effective rate measured from the client is 999/s, and connections
experience an average response time of 99.5 ms with a standard
deviation of 2 ms.
The reason for this inaccuracy is that when computing frequency
counters, we use one part of the previous value proportional to the
number of milliseconds remaining in the current second. But even the
last millisecond still uses a part of the past value, which is wrong :
since we have a 1ms resolution, the last millisecond must be dedicated
only to filling the current second.
So we slightly adjust the algorithm to use 999/1000 of the past value
during the first millisecond, and 0/1000 of the past value during the
last millisecond. We also slightly improve the computation by computing
the remaining time instead of the current time in tv_update_date(), so
that we don't have to negate the value in each frequency counter.
Now with the fix, the connection rate measured by both the client and
haproxy is a steady 1000/s, the average response time measured is 99.2ms
and more importantly, the standard deviation has been divided by 3 to
0.6 millisecond.
This fix should also be backported to 1.4 which has the same issue.
These macros (U2H, U2A, LIM2A, ...) have been used with an explicit
index for the local storage variable, making it difficult to change
log formats and causing a few issues from time to time. Let's have
a single macro with a rotating index so that up to 10 conversions
may be used in a single call.
At the moment, we need trash chunks almost everywhere and the only
correctly implemented one is in the sample code. Let's move this to
the chunks so that all other places can use this allocator.
Additionally, the get_trash_chunk() function now really returns two
different chunks. Previously it used to always overwrite the same
chunk and point it to a different buffer, which was a bit tricky
because it's not obvious that two consecutive results do alias each
other.
This is done by passing the default value to SSLCACHESIZE in sessions.
User can use tune.sslcachesize to change this value.
By default, it is set to 20000 sessions as openssl internal cache size.
Currently, a session entry size is between 592 and 616 bytes depending on the arch.
Now that all pollers make use of speculative I/O, there is no point
having two epoll implementations, so replace epoll with the sepoll code
and remove sepoll which has just become the standard epoll method.
It is stupid to loop over ->snd_buf() because the snd_buf() itself already
loops and stops when system buffers are full. But looping again onto it,
we lose the information of the full buffers and perform one useless syscall.
Furthermore, this causes issues when dealing with large uploads while waiting
for a connection to establish, as it can report a server reject of some data
as a connection abort, which is wrong.
1.4 does not have this issue as it loops maximum twice (once for each buffer
half) and exists as soon as system buffers are full. So no backport is needed.
This function's naming was misleading as it is used to append data
at the end of a string, causing some surprizes when used for the
first time!
Add a chunk_printf() function which does what its name suggests.
This is a first step in avoiding to constantly reinitialize chunks.
It replaces the old chunk_reset() which was not properly named as it
used to drop everything and was only used by chunk_destroy(). It has
been renamed chunk_drop().
This tiny function was not inlined because initially not much used.
However it's been used un the chunk parser for a while and it became
one of the most CPU-cycle eater there. By inlining it, the chunk parser
speed was increased by 74 %. We're almost 3 times faster than original
with just the last 4 commits.
It's sometimes needed to be able to compare a zero-terminated string with a
chunk, so we now have two functions to do that, one strcmp() equivalent and
one strcasecmp() equivalent.
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.
Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
On Linux, accept4() does the same as accept() except that it allows
the caller to specify some flags to set on the resulting socket. We
use this to set the O_NONBLOCK flag and thus to save one fcntl()
call in each connection. The effect is a small performance gain of
around 1%.
The option is automatically enabled when target linux2628 is set, or
when the USE_ACCEPT4 Makefile variable is set. If the libc is too old
to provide the equivalent function, this is automatically detected and
our own function is used instead. In any case it is possible to force
the use of our implementation with USE_MY_ACCEPT4.
These ones are used to set the default ciphers suite on "bind" lines and
"server" lines respectively, instead of using OpenSSL's defaults. These
are probably mainly useful for distro packagers.
Alex Markham reported and diagnosed a bug appearing on 1.5-dev11,
causing a crash on x86_64 when header hashing is used. The cause is
a missing (int) cast causing a negative offset to appear positive
and the resulting pointer to go out of bounds.
The crash is not possible anymore since 1.5-dev12 because a second
bug caused the negative sign to disappear so the pointer is always
within range but always wrong, so balance hdr() never works anymore.
This fix restores the correct behaviour and ensures the sign is
correct.
Bind parsers may return multiple errors, so let's make use of a new function
to re-indent multi-line error messages so that they're all reported in their
context.
It appears that fd.h includes a number of unneeded files and was
included from standard.h, and as such served as an intermediary
to provide almost everything to everyone.
By removing its useless includes, a long dependency chain broke
but could easily be fixed.
I/O handlers now all use __conn_{sock,data}_{stop,poll,want}_* instead
of returning dummy flags. The code has become slightly simpler because
some tricks such as the MIN_RET_FOR_READ_LOOP are not needed anymore,
and the data handlers which switch to a handshake handler do not need
to disable themselves anymore.
These functions do not depend on the channel flags anymore thus they're
much better suited to be used on plain buffers. Move them from channel
to buffer.
This macro is usable like printf but sends messages to fd #-1, which has no
visible effect but is easy to spot in strace. This is very useful to put
tracers at many points during debugging sessions.
All keywords registered using a cfg_kw_list now make use of the new error reporting
framework. This allows easier and more precise error reporting without having to
deal with complex buffer allocation issues.
From time to time, some bugs are discovered that are caused by non-initialized
memory areas. It happens that most platforms return a zero-filled area upon
first malloc() thus hiding potential bugs. This patch also replaces malloc()
in pools with calloc() to ensure that all platforms exhibit the same behaviour
upon startup. In order to catch these bugs more easily, add a -dM command line
flag to enable memory poisonning. Optionally, passing -dM<byte> forces the
poisonning byte to <byte>.
memprintf() is just like snprintf() except that it always returns a properly
sized allocated string that the caller is responsible for freeing. NULL is
returned on serious errors. It also supports stackable calls over the same
pointer since it offers support for automatically freeing a previous one :
memprintf(&err, "invalid argument: '%s'", arg);
...
memprintf(&err, "keyword parser said: <%s>", *err);
...
memprintf(&err, "line parser said: %s\n", *err);
...
free(*err);
(from ebtree 6.0.6)
This version is mainly aimed at clarifying the fact that the ebtree license
is LGPL. Some files used to indicate LGPL and other ones GPL, while the goal
clearly is to have it LGPL. A LICENSE file has also been added.
No code is affected, but it's better to have the local tree in sync anyway.
(cherry picked from commit 24dc7cca051f081600fe8232f33e55ed30e88425)
For a long time, the max number of headers was taken as a part of the buffer
size. Since the header size can be configured at runtime, it does not make
much sense anymore.
Nothing was making it necessary to have a static value, so let's turn this into
a tunable with a default value of 101 which equals what was previously used.
By default, pipes are the default size for the system. But sometimes when
using TCP splicing, it can improve performance to increase pipe sizes,
especially if it is suspected that pipes are not filled and that many
calls to splice() are performed. This has an impact on the kernel's
memory footprint, so this must not be changed if impacts are not understood.
We now measure the work and idle times in order to report the idle
time in the stats. It's expected that we'll be able to use it at
other places later.
Many inet_ntop calls were partially right, which was hard to detect given
the complex combinations. Some of them were relying on the listener's proto
instead of the address itself, which could have been different when dealing
with an accept-proxy connection.
The new addr_to_str() function does the dirty job and returns the family, which
makes it particularly suited to calls from switch/case statements. A large number
of if/else statements were removed and the stats output could even be cleaned up
in the case of session dump.
As a side effect of doing this, the resulting code is smaller by almost 1kB.
All changed parts have been tested and provided expected output.
Some older libc don't define splice() and and don't define _syscall*()
either, which causes build errors if splicing is enabled.
To solve this, we now split the syscall redefinition into two layers :
- one file per syscall (epoll, splice)
- one common file to declare the _syscall*() macros
The code is cleaner because files using the syscalls just have to include
their respective file. It's not adviced to merge multiple syscall families
into a same file if all are not intended to be used simultaneously, because
defining unused static functions causes warnings to be emitted during build.
As a result, the new USE_MY_SPLICE parameter was added in order to be able
to define the splice() syscall separately.
This patch provides a new "option redis-check" statement to enable server health checks based on redis PING request (http://www.redis.io/commands/ping).
apsession_refresh() and apsess_refressh are only used inside apsession.c
and thus can be made static.
The only use of apsession_refresh() is appsession_task_init().
These functions have been re-ordered to avoid the need for
a forward-declaration of apsession_refresh().
Bashkim Kasa reported that the stats admin page did not work when colons
were used in server or backend names. This was caused by url-encoding
resulting in ':' being sent as '%3A'. Now we systematically decode the
field names and values to fix this issue.
It's more expensive to call splice() on short payloads than to use
recv()+send(). One of the reasons is that doing a splice() involves
allocating a pipe. One other reason is that the kernel will have to
copy itself if we try to splice less than a page. So let's fix a
short offset of 4kB below which we don't splice.
A quick test shows that on chunked encoded data, with splice we had
6826 syscalls (1715 splice, 3461 recv, 1650 send) while with this
patch, the same transfer resulted in 5793 syscalls (3896 recv, 1897
send).
John Helliwell reported a runtime issue on Solaris since 1.5-dev5. Traces
show that connect() returns EINVAL, which means the socket length is not
appropriate for the family. Solaris does not like being called with sizeof
and needs the address family's size on sockaddr_storage.
The fix consists in adding a get_addr_len() function which returns the
socket's address length based on its family. Tests show that this works
for both IPv4 and IPv6 addresses.
Since IPv6 is a different type than IPv4, the pattern fetch functions
src6 and dst6 were added. IPv6 stick-tables can also fetch IPv4 addresses
with src and dst. In this case, the IPv4 addresses are mapped to their
IPv6 counterpart, according to RFC 4291.
The parser now distinguishes between pure addresses and address:port. This
is useful for some config items where only an address is required.
Raw IPv6 addresses are now parsed, but IPv6 host name resolution is still not
handled (gethostbyname does not resolve IPv6 names to addresses).
And also rename "req_acl_rule" "http_req_rule". At the beginning that
was a bit confusing to me, especially the "req_acl" list which in fact
holds what we call rules. After some digging, it appeared that some
part of the code is 100% HTTP and not just related to authentication
anymore, so let's move that part to HTTP and keep the auth-only code
in auth.c.
This patch turns internal server addresses to sockaddr_storage to
store IPv6 addresses, and makes the connect() function use it. This
code already works but some caveats with getaddrinfo/gethostbyname
still need to be sorted out while the changes had to be merged at
this stage of internal architecture changes. So for now the config
parser will not emit an IPv6 address yet so that user experience
remains unchanged.
This change should have absolutely zero user-visible effect, otherwise
it's a bug introduced during the merge, that should be reported ASAP.
We'll use this hash at other places, let's make it globally available.
The function has also been renamed because its "chash_hash" name was
not appropriate.
MAXPATHLEN may be used at other places, it's unconvenient to have it
redefined in a few files. Also, since checking it requires including
sys/param.h, some versions of it cause a macro declaration conflict
with MIN/MAX which are defined in tools.h. The solution consists in
including sys/param.h in both files so that we ensure it's loaded
before the macros are defined and MAXPATHLEN is checked.
inetaddr_host_lim_ret() used to make use of const char** for some
args, but that make it impossible ot use char** due to the way
controls are made by gcc. So let's change that.
The stats web interface must be read-only by default to prevent security
holes. As it is now allowed to enable/disable servers, a new keyword
"stats admin" is introduced to activate this admin level, conditioned by ACLs.
(cherry picked from commit 5334bab92ca7debe36df69983c19c21b6dc63f78)
These functions only require 5 chars to encode 30 bits, and don't expect
any padding. They will be used to encode dates in cookies.
(cherry picked from commit a7e2b5fc4612994c7b13bcb103a4a2c3ecd6438a)
In all cookie persistence modes but prefix, we now support cookies whose
value is suffixed with some contents after a vertical bar ('|'). This will
be used to pass an optional expiration date. So as of now we only consider
the part of the cookie value which is used before the vertical bar.
(cherry picked from commit a4486bf4e5b03b5a980d03fef799f6407b2c992d)
This patch provides a new "option ldap-check" statement to enable
server health checks based on LDAPv3 bind requests.
(cherry picked from commit b76b44c6fed8a7ba6f0f565dd72a9cb77aaeca7c)
Some config parsing functions need to return composite status codes
when they rely on other functions. Let's provide a few such codes
for general use and extend them later.
We'll need to divide 64 bits by 32 bits with new frequency counters.
Gcc does not know when it can safely do that, but the way we build
our operations let us be sure. So let's provide an optimised version
for that purpose.
The quote_arg() function can be used to quote an argument or indicate
"end of line" if it's null or empty. It should be useful to more precisely
report location of problems in the configuration.
pattern.c depended on stick_table while in fact it should be the opposite.
So we move from pattern.c everything related to stick_tables and invert the
dependency. That way the code becomes more logical and intuitive.
Using get_ip_from_hdr2() we can look for occurrence #X or #-X and
extract the IP it contains. This is typically designed for use with
the X-Forwarded-For header.
Using "usesrc hdr_ip(name,occ)", it becomes possible to use the IP address
found in <name>, and possibly specify occurrence number <occ>, as the
source to connect to a server. This is possible both in a server and in
a backend's source statement. This is typically used to use the source
IP previously set by a upstream proxy.
today I've noticed that the stats page still displays v1.3 in the
"Updates" link, due to the PRODUCT_BRANCH value in version.h, then
it's maybe time to send you the result (notice that the patch updates
PRODUCT_BRANCH to "1.4").
--
Cyril Bont
Support the new syntax (http-request allow/deny/auth) in
http stats.
Now it is possible to use the same syntax is the same like in
the frontend/backend http-request access control:
acl src_nagios src 192.168.66.66
acl stats_auth_ok http_auth(L1)
stats http-request allow if src_nagios
stats http-request allow if stats_auth_ok
stats http-request auth realm LB
The old syntax is still supported, but now it is emulated
via private acls and an aditional userlist.
Add generic authentication & authorization support.
Groups are implemented as bitmaps so the count is limited to
sizeof(int)*8 == 32.
Encrypted passwords are supported with libcrypt and crypt(3), so it is
possible to use any method supported by your system. For example modern
Linux/glibc instalations support MD5/SHA-256/SHA-512 and of course classic,
DES-based encryption.
Implement Base64 decoding with a reverse table.
The function accepts and decodes classic base64 strings, which
can be composed from many streams as long each one is properly
padded, for example: SGVsbG8=IEhBUHJveHk=IQ==
Currently we cannot easily add headers nor anything to HTTP checks
because the requests are pre-formatted with the last CRLF. Make the
check code add the CRLF itself so that we can later add useful info.
Some header values might be delimited with spaces, so it's not enough to
compare "close" or "keep-alive" with strncasecmp(). Use word_match() for
that.
Supported informations, available via "tr/td title":
- cap: capabilities (proxy)
- mode: one of tcp, http or health (proxy)
- id: SNMP ID (proxy, socket, server)
- IP (socket, server)
- cookie (backend, server)
Implement decreasing health based on observing communication between
HAProxy and servers.
Changes in this version 2:
- documentation
- close race between a started check and health analysis event
- don't force fastinter if it is not set
- better names for options
- layer4 support
Changes in this version 3:
- add stats
- port to the current 1.4 tree
It's a pain to enable regparm because ebtree is built in its corner
and does not depend on the rest of the config. This causes no problem
except that if the regparm settings are not exactly similar, then we
can get inconsistent function interfaces and crashes.
One solution realized in this patch consists in externalizing all
compiler settings and changing CONFIG_XXX_REGPARM into CONFIG_REGPARM
so that we ensure that any sub-component uses the same setting. Since
ebtree used a value here and not a boolean, haproxy's config has been
set to use a number too. Both haproxy's core and ebtree currently use
the same copy of the compiler.h file. That way we don't have any issue
anymore when one setting changes somewhere.
All files referencing the previous ebtree code were changed to point
to the new one in the ebtree directory. A makefile variable (EBTREE_DIR)
is also available to use files from another directory.
The ability to build the libebtree library temporarily remains disabled
because it can have an impact on some existing toolchains and does not
appear worth it in the medium term if we add support for multi-criteria
stickiness for instance.
Capture & display more data from health checks, like
strerror(errno) for L4 failed checks or a first line
from a response for L7 successes/failed checks.
Non ascii or control characters are masked with
chunk_htmlencode() (html stats) or chunk_asciiencode() (logs).
This patch implements "description" (proxy and global) and "node" (global)
options, removes "node-name" and adds "show-node" & "show-desc" options
for "stats". It also changes the way the header lines (with proxy name) and
the statistics are displayed, so stats no longer look so clumsy with very
long names.
Instead of "node-name" it is possible to use show-node/show-desc with
an optional parameter that overrides a default node/description.
backend cust-0045
# report specific values for this customer
stats show-node Europe
stats show-desc Master node for Europe, Asia, Africa
In TCP, we don't want to forward chunks of data, we want to forward
indefinitely. This patch introduces a special value for the amount
of data to be forwarded. When buffer_forward() is called with
BUF_INFINITE_FORWARD, it configures the buffer to never stop
forwarding until the end.
send() supports the MSG_MORE flag on Linux, which does the same
as TCP_CORK except that we don't have to remove TCP_NODELAY before
and we don't need any syscall to set/remove it. This can save up
to 4 syscalls around a send() (two for setting it, two for removing
it), and it's much cleaner since it is not persistent. So make use
of it instead.
The remains of the stats socket code has nothing to do in proto_uxst
anymore and must move to dumpstats. The code is much cleaner and more
structured. It was also an opportunity to rename AN_REQ_UNIX_STATS
as AN_REQ_STATS_SOCK as the stats socket is no longer unix-specific
either.
The last item refering to stats in proto_uxst is the setting of the
task's nice value which should in fact come from the listener.
The new "node-name" stats setting enables reporting of a node ID on
the stats page. It is possible to return the system's host name as
well as a specific name.
We now support up to 10 distinct configuration files. They are
all loaded in the order defined by -f <file1> -f <file2> ...
This can be useful in order to store global, private, public,
etc... configurations in distinct files.
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.
The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.
This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
These functions will be used to deliver asynchronous signals in order
to make the signal handling functions more robust. The goal is to keep
the same interface to signal handlers.
I have attached a patch which will add on every http request a new
header 'X-Original-To'. If you have HAProxy running in transparent mode
with a big number of SQUID servers behind it, it is very nice to have
the original destination ip as a common header to make decisions based
on it.
The whole thing is configurable with a new option 'originalto'. I have
updated the sourcecode as well as the documentation. The 'haproxy-en.txt'
and 'haproxy-fr.txt' files are untouched, due to lack of my french
language knowledge. ;)
Also the patch adds this header for IPv4 only. I haven't any IPv6 test
environment running here and don't know if getsockopt() with SO_ORIGINAL_DST
will work on IPv6. If someone knows it and wants to test it I can modify
the diff. Feel free to ask me questions or things which should be changed. :)
--Maik
This function sets CSS letter spacing after each 3rd digit. The page must
create a class "rls" (right letter spacing) with style "letter-spacing: 0.3em"
in order to use it.
If we get very large data at once, it's almost certain that it's
worthless trying to read again, because we got everything we could
get.
Doing this has made all -EAGAIN disappear from splice reads. The
threshold has been put in the global tunable structures so that if
we one day want to make it accessible from user config, it will be
easy to do so.
Since we're now able to search from a precise expiration date in
the timer tree using ebtree 4.1, we don't need to maintain 4 trees
anymore. Not only does this simplify the code a lot, but it also
ensures that we can always look 24 days back and ahead, which
doubles the ability of the previous scheduler. Indeed, while based
on absolute values, the timer tree is now relative to <now> as we
can always search from <now>-31 bits.
The run queue uses the exact same principle now, and is now simpler
and a bit faster to process. With these changes alone, an overall
0.5% performance gain was observed.
Tests were performed on the few wrapping cases and everything works
as expected.
There are some configurations in which redirect rules are declared
after use_backend rules. We can also find "block" rules after any
of these ones. The processing sequence is :
- block
- redirect
- use_backend
So as of now we try to detect wrong ordering to warn the user about
a possibly undesired behaviour.
Most of the time, task_queue() will immediately return. By extracting
the preliminary checks and putting them in an inline function, we can
significantly reduce the number of calls to the function itself, and
most of the tests can be optimized away due to the caller's context.
Another minor improvement in process_runnable_tasks() consisted in
taking benefit from the processor's branch prediction unit by making
a special case of the process_session() callback which is by far the
most common one.
All this improved performance by about 1%, mainly during the call
from process_runnable_tasks().
Timers are unsigned and used as tree positions. Ticks are signed and
used as absolute date within current time frame. While the two are
normally equal (except zero), it's important not to confuse them in
the code as they are not interchangeable.
We add two inline functions to turn each one into the other.
The comments have also been moved to the proper location, as it was
not easy to understand what was a tick and what was a timer unit.
All the tasks callbacks had to requeue the task themselves, and update
a global timeout. This was not convenient at all. Now the API has been
simplified. The tasks callbacks only have to update their expire timer,
and return either a pointer to the task or NULL if the task has been
deleted. The scheduler will take care of requeuing the task at the
proper place in the wait queue.