Commit Graph

1923 Commits

Author SHA1 Message Date
Thierry FOURNIER
33a7433ac9 MEDIUM: pattern: Index IPv6 addresses in a tree.
This commit adds second tree node in the pattern struct and use it to
index IPv6 addresses. This commit report feature used in the list. If
IPv4 not match the tree, try to convert the IPv4 address in IPv6 with
prefixing the IPv4 address by "::ffff", after this operation, the match
function try lookup in the IPv6 tree. If the IPv6 sample dont match the
IPv6 tree, try to convert the IPv6 addresses prefixed by "2002:IPv4",
"::ffff:IPv4" and "::0000:IPv4" in IPv4 address. after this operation,
the match function try lookup in the IPv4 tree.
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
5338eea8eb MEDIUM: pattern: The match function browse itself the list or the tree.
The match function known the format of the pattern. The pattern can be
stored in a list or in a tree. The pattern matching function use itself
the good entry point and indexation type.

Each pattern matching function return the struct pattern that match. If
the flag "fill" is set, the struct pattern is filled, otherwise the
content of this struct must not be used.

With this feature, the general pattern matching function cannot have
exceptions for building the "struct pattern".
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
1794fdf37e MEDIUM: pattern: The function pattern_exec_match() returns "struct pattern" if the patten match.
Before this commit, the pattern_exec_match() function returns the
associate sample, the associate struct pattern or the associate struct
pattern_tree. This is complex to use, because we can check the type of
information returned.

Now the function return always a "struct pattern". If <fill> is not set,
only the value of the pointer can be used as boolean (NULL or other). If
<fill> is set, you can use the <smp> pointer and the pattern
information.

If information must be duplicated, it is stored in trash buffer.
Otherwise, the pattern can point on existing strings.
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
d437314979 MEDIUM: sample/http_proto: Add new type called method
The method are actuelly stored using two types. Integer if the method is
known and string if the method is not known. The fetch is declared as
UINT, but in some case it can provides STR.

This patch create new type called METH. This type contain interge for
known method and string for the other methods. It can be used with
automatic converters.

The pattern matching can expect method.

During the free or prune function, http_meth pettern is freed. This
patch initialise the freed pointer to NULL.
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
7654c9ff44 MEDIUM: sample: Remove types SMP_T_CSTR and SMP_T_CBIN, replace it by SMP_F_CONST flags
The operations applied on types SMP_T_CSTR and SMP_T_STR are the same,
but the check code and the declarations are double, because it must
declare action for SMP_T_C* and SMP_T_*. The declared actions and checks
are the same. this complexify the code. Only the "conv" functions can
change from "C*" to "*"

Now, if a function needs to modify input string, it can call the new
function smp_dup(). This one duplicate data in a trash buffer.
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
b050463375 MINOR: standard: Add function for converting cidr to network mask. 2014-03-17 18:06:07 +01:00
Thierry FOURNIER
0e9af55700 MINOR: sample: dont call the sample cast function "c_none"
If the cast function to execute is c_none, dont execute it and return
true. The function c_none, do nothing. This save a call.
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
e1bcac5b8f MINOR: pattern: Rename "pat_idx_elt" to "pattern_tree"
This is just for having coherent struct names.
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
edc15c3a35 MEDIUM: pattern: The parse functions just return "struct pattern" without memory allocation
The pattern parse functions put the parsed result in a "struct pattern"
without memory allocation. If the pattern must reference the input data
without changes, the pattern point to the parsed string. If buffers are
needed to store translated data, it use th trash buffer. The indexation
function that allocate the memory later if it is needed.
2014-03-17 18:06:07 +01:00
Thierry FOURNIER
b9b08460a2 MEDIUM: pattern: add indexation function.
Before this patch, the indexation function check the declared patttern
matching function and index the data according with this function. This
is not useful to add some indexation mode.

This commit adds dedicated indexation function. Each struct pattern is
associated with one indexation function. This function permit to index
data according with the type of pattern and with the type of match.
2014-03-17 18:06:06 +01:00
Willy Tarreau
1cf8f08c17 MINOR: sample: move smp_to_type to sample.c
This way it can be exported and reused anywhere else to report type names.
2014-03-17 18:06:06 +01:00
Thierry FOURNIER
3ead5b93c6 MINOR: pattern: separe list element from the data part.
This commit separes the "struct list" used for the chain the "struct
pattern" which contain the pattern data. Later, this change will permit
to manipulate lists ans trees with the same "struct pattern".
2014-03-17 18:06:06 +01:00
Thierry FOURNIER
972028fa67 MEDIUM: pattern: Change the prototype of the function pattern_register().
Each pattern parser take only one string. This change is reported to the
function prototype of the function "pattern_register()". Now, it is
called with just one string and no need to browse the array of args.
2014-03-17 18:06:06 +01:00
Thierry FOURNIER
580c32cb3a MEDIUM: pattern: The pattern parser no more uses <opaque> and just takes one string.
After the previous patches, the "pat_parse_strcat()" function disappear,
and the "pat_parse_int()" and "pat_parse_dotted_ver()" functions dont
use anymore the "opaque" argument, and take only one string on his
input.

So, after this patch, each pattern parser no longer use the opaque
variable and take only one string as input. This patch change the
prototype of the pattern parsing functions.

Now, the "char **args" is replaced by a "char *arg", the "int *opaque"
is removed and these functions return 1 in succes case, and 0 if fail.
2014-03-17 18:06:06 +01:00
Thierry FOURNIER
511e9475f2 MEDIUM: acl/pattern: standardisation "of pat_parse_int()" and "pat_parse_dotted_ver()"
The goal of these patch is to simplify the prototype of
"pat_pattern_*()" functions. I want to replace the argument "char
**args" by a simple "char *arg" and remove the "opaque" argument.

"pat_parse_int()" and "pat_parse_dotted_ver()" are the unique pattern
parser using the "opaque" argument and using more than one string
argument of the char **args. These specificities are only used with ACL.
Other systems using this pattern parser (MAP and CLI) just use one
string for describing a range.

This two functions can read a range, but the min and the max must y
specified. This patch extends the syntax to describe a range with
implicit min and max. This is used for operators like "lt", "le", "gt",
and "ge". the syntax is the following:

   ":x" -> no min to "x"
   "x:" -> "x" to no max

This patch moves the parsing of the comparison operator from the
functions "pat_parse_int()" and "pat_parse_dotted_ver()" to the acl
parser. The acl parser read the operator and the values and build a
volatile string readable by the functions "pat_parse_int()" and
"pat_parse_dotted_ver()". The transformation is done with these rules:

If the parser is "pat_parse_int()":

   "eq x" -> "x"
   "le x" -> ":x"
   "lt x" -> ":y" (with y = x - 1)
   "ge x" -> "x:"
   "gt x" -> "y:" (with y = x + 1)

If the parser is "pat_parse_dotted_ver()":

   "eq x.y" -> "x.y"
   "le x.y" -> ":x.y"
   "lt x.y" -> ":w.z" (with w.z = x.y - 1)
   "ge x.y" -> "x.y:"
   "gt x.y" -> "w.z:" (with w.z = x.y + 1)

Note that, if "y" is not present, assume that is "0".

Now "pat_parse_int()" and "pat_parse_dotted_ver()" accept only one
pattern and the variable "opaque" is no longer used. The prototype of
the pattern parsers can be changed.
2014-03-17 18:06:06 +01:00
Thierry FOURNIER
9eec0a646b MAJOR: auth: Change the internal authentication system.
This patch remove the limit of 32 groups. It also permit to use standard
"pat_parse_str()" function in place of "pat_parse_strcat()". The
"pat_parse_strcat()" is no longer used and its removed. Before this
patch, the groups are stored in a bitfield, now they are stored in a
list of strings. The matching is slower, but the number of groups is
low and generally the list of allowed groups is short.

The fetch function "smp_fetch_http_auth_grp()" used with the name
"http_auth_group" return valid username. It can be used as string for
displaying the username or with the acl "http_auth_group" for checking
the group of the user.

Maybe the names of the ACL and fetch methods are no longer suitable, but
I keep the current names for conserving the compatibility with existing
configurations.

The function "userlist_postinit()" is created from verification code
stored in the big function "check_config_validity()". The code is
adapted to the new authentication storage system and it is moved in the
"src/auth.c" file. This function is used to check the validity of the
users declared in groups and to check the validity of groups declared
on the "user" entries.

This resolve function is executed before the check of all proxy because
many acl needs solved users and groups.
2014-03-17 18:06:06 +01:00
Thierry FOURNIER
d048d8b891 BUG/MINOR: http: fix encoding of samples used in http headers
The binary samples are sometimes copied as is into http headers.
A sample can contain bytes unallowed by the http rfc concerning
header content, for example if it was extracted from binary data.
The resulting http request can thus be invalid.

This issue does not yet happen because haproxy currently (mistakenly)
hex-encodes binary data, so it is not really possible to retrieve
invalid HTTP chars.

The solution consists in hex-encoding all non-printable chars prefixed
by a '%' sign.

No backport is needed since existing code is not affected yet.
2014-03-17 16:39:03 +01:00
Thierry FOURNIER
e059ec9393 MINOR: standard: add function "encode_chunk"
This function has the same behavior as encode_string(), except it
takes a "struct chunk" instead of a "char *" on input.
2014-03-17 16:38:56 +01:00
Willy Tarreau
f79d950163 MEDIUM: proxy: create a tree to store proxies by name
Large configurations can take time to parse when thousands of backends
are in use. Let's store all the proxies in trees.

findproxy_mode() has been modified to use the tree for lookups, which
has divided the parsing time by about 2.5. But many lookups are still
present at many places and need to be dealt with.
2014-03-15 07:48:35 +01:00
Willy Tarreau
80a92c02f4 BUG/MEDIUM: http: don't start to forward request data before the connect
Currently, "balance url_param check_post" randomly works. If the client
sends chunked data and there's another chunk after the one containing the
data, http_request_forward_body() will advance msg->sov and move the start
of data to the beginning of the last chunk, and get_server_ph_post() will
not find the data.

In order to avoid this, we add an HTTP_MSGF_WAIT_CONN flag whose goal is
to prevent the forwarding code from parsing until the connection is
confirmed, so that we're certain not to fail on a redispatch. Note that
we need to force channel_auto_connect() since the output buffer is empty
and a previous analyser might have stopped auto-connect.

The flag is currently set whenever some L7 POST analysis is needed for a
connect() so that it correctly addresses all corner cases involving a
possible rewind of the buffer, waiting for a better fix.

Note that this has been broken for a very long time. Even all 1.4 versions
seem broken but differently, with ->sov pointing to the end of the arguments.
So the fix should be considered for backporting to all stable releases,
possibly including 1.3 which works differently.
2014-03-14 12:22:56 +01:00
Willy Tarreau
36346247ac BUG/MEDIUM: http: continue to emit 503 on keep-alive to different server
Finn Arne Gangstad reported that commit 6b726adb35 ("MEDIUM: http: do
not report connection errors for second and further requests") breaks
support for serving static files by abusing the errorfile 503 statement.

Indeed, a second request over a connection sent to any server or backend
returning 503 would silently be dropped.

The proper solution consists in adding a flag on the session indicating
that the server connection was reused, and to only avoid the error code
in this case.
2014-02-24 18:26:30 +01:00
Willy Tarreau
7e3127391f MINOR: config: make the stream interface idle timer user-configurable
The new tune.idletimer value allows one to set a different value for
idle stream detection. The default value remains set to one second.
It is possible to disable it using zero, and to change the default
value at build time using DEFAULT_IDLE_TIMER.
2014-02-12 16:36:12 +01:00
Willy Tarreau
b145c78623 MINOR: channel: add the date of last read in the channel
We store the time stamp of last read in the channel in order to
be able to measure some bit rate and pause lengths. We only use
16 bits which were unused for this. We don't need more, as it
allows us to measure with a millisecond precision for up to 65s.
2014-02-12 11:45:59 +01:00
Willy Tarreau
8f39dcdc8d BUG/MINOR: channel: initialize xfer_small/xfer_large on new buffers
These ones are only reset during transfers. There is a low but non-null
risk that a first full read causes the previous value to be reused and
immediately to immediately set the CF_STREAMER flag. The impact is only
to increase earlier than expected the SSL record size and to use splice().

This bug was already present in 1.4, so a backport is possible.
2014-02-12 11:45:45 +01:00
Bhaskar Maddala
a20cb85eba MINOR: stats: Enhancement to stats page to provide information of last session time.
Summary:
Track and report last session time on the stats page for each server
in every backend, as well as the backend.

This attempts to address the requirement in the ROADMAP

  - add a last activity date for each server (req/resp) that will be
    displayed in the stats. It will be useful with soft stop.

The stats page reports this as time elapsed since last session. This
change does not adequately address the requirement for long running
session (websocket, RDP... etc).
2014-02-08 01:19:58 +01:00
Willy Tarreau
7bed945be0 OPTIM: ssl: implement dynamic record size adjustment
By having the stream interface pass the CF_STREAMER flag to the
snd_buf() primitive, we're able to tell the send layer whether
we're sending large chunks or small ones.

We use this information in SSL to adjust the max record dynamically.
This results in small chunks respecting tune.ssl.maxrecord at the
beginning of a transfer or for small transfers, with an automatic
switch to full records if the exchanges last long. This allows the
receiver to parse HTML contents on the fly without having to retrieve
16kB of data, which is even more important with small initcwnd since
the receiver does not need to wait for round trips to start fetching
new objects. However, sending large files still produces large chunks.

For example, with tune.ssl.maxrecord = 2859, we see 5 write(2885)
sent in two segments each and 6 write(16421).

This idea was first proposed on the haproxy mailing list by Ilya Grigorik.
2014-02-06 11:37:29 +01:00
Willy Tarreau
1049b1f551 MEDIUM: connection: don't use real send() flags in snd_buf()
This prevents us from passing other useful info and requires the
upper levels to know these flags. Let's use a new flags category
instead : CO_SFL_*. For now, only MSG_MORE has been remapped.
2014-02-06 11:37:29 +01:00
Baptiste Assmann
69e273f3fc MEDIUM: tcp-check new feature: connect
A new tcp-check rule type: connect.
It allows HAProxy to test applications which stand on multiple ports or
multiple applications load-balanced through the same backend.
2014-02-03 00:24:11 +01:00
Willy Tarreau
70dffdaa10 MAJOR: http: switch to keep-alive mode by default
Since we support HTTP keep-alive, there is no more reason for staying
in tunnel mode by default. It is confusing for new users and creates
more issues than it solves. Option "http-tunnel" is available to force
to use it if really desired.

Switching to KA by default has implied to change the value of some
option flags and some transaction flags so that value zero (default)
matches keep-alive. That explains why more code has been changed than
expected. Tests have been run on the 25 combinations of frontend and
backend options, plus a few with option http-pretend-keepalive, and
no anomaly was found.

The relation between frontend and backends remains the same. Options
have been updated to take precedence over http-keep-alive which is now
implicit.

All references in the doc to haproxy not supporting keep-alive have
been fixed, and the doc for config options has been updated.
2014-01-30 03:14:29 +01:00
Willy Tarreau
02bce8be01 MAJOR: http: update connection mode configuration
At the very beginning of haproxy, there was "option httpclose" to make
haproxy add a "Connection: close" header in both directions to invite
both sides to agree on closing the connection. It did not work with some
rare products, so "option forceclose" was added to do the same and actively
close the connection. Then client-side keep-alive was supported, so option
http-server-close was introduced. Now we have keep-alive with a fourth
option, not to mention the implicit tunnel mode.

The connection configuration has become a total mess because all the
options above may be combined together, despite almost everyone thinking
they cancel each other, as judging from the common problem reports on the
mailing list. Unfortunately, re-reading the doc shows that it's not clear
at all that options may be combined, and the opposite seems more obvious
since they're compared. The most common issue is options being set in the
defaults section that are not negated in other sections, but are just
combined when the user expects them to be overloaded. The migration to
keep-alive by default will only make things worse.

So let's start to address the first problem. A transaction can only work in
5 modes today :
  - tunnel : haproxy doesn't bother with what follows the first req/resp
  - passive close : option http-close
  - forced close : option forceclose
  - server close : option http-server-close with keep-alive on the client side
  - keep-alive   : option http-keep-alive, end to end

All 16 combination for each section fall into one of these cases. Same for
the 256 combinations resulting from frontend+backend different modes.

With this patch, we're doing something slightly different, which will not
change anything for users with valid configs, and will only change the
behaviour for users with unsafe configs. The principle is that these options
may not combined anymore, and that the latest one always overrides all the
other ones, including those inherited from the defaults section. The "no
option xxx" statement is still supported to cancel one option and fall back
to the default one. It is mainly needed to ignore defaults sections (eg:
force the tunnel mode). The frontend+backend combinations have not changed.

So for examplen the following configuration used to put the connection
into forceclose :

    defaults http
        mode http
        option httpclose

    frontend foo.
        option http-server-close

  => http-server-close+httpclose = forceclose before this patch! Now
     the frontend's config replaces the defaults config and results in
     the more expected http-server-close.

All 25 combinations of the 5 modes in (frontend,backend) have been
successfully tested.

In order to prepare for upcoming changes, a new "option http-tunnel" was
added. It currently only voids all other options, and has the lowest
precedence when mixed with another option in another frontend/backend.
2014-01-30 03:14:29 +01:00
Emeric Brun
850efd5149 MEDIUM: ssl: Set verify 'required' as global default for servers side.
If no CA file specified on a server line, the config parser will show an error.

Adds an cmdline option '-dV' to re-set verify 'none' as global default on
servers side (previous behavior).

Also adds 'ssl-server-verify' global statement to set global default to
'none' or 'required'.

WARNING: this changes the default verify mode from "none" to "required" on
the server side, and it *will* break insecure setups.
2014-01-29 17:08:15 +01:00
Willy Tarreau
cc08d2c9ff MEDIUM: counters: stop relying on session flags at all
Till now, we had one flag per stick counter to indicate if it was
tracked in a backend or in a frontend. We just had to add another
flag per stick-counter to indicate if it relies on contents or just
connection. These flags are quite painful to maintain and tend to
easily conflict with other flags if their number is changed.

The correct solution consists in moving the flags to the stkctr struct
itself, but currently this struct is made of 2 pointers, so adding a
new entry there to store only two bits will cause at least 16 more bytes
to be eaten per counter due to alignment issues, and we definitely don't
want to waste tens to hundreds of bytes per session just for things that
most users don't use.

Since we only need to store two bits per counter, an intermediate
solution consists in replacing the entry pointer with a composite
value made of the original entry pointer and the two flags in the
2 unused lower bits. If later a need for other flags arises, we'll
have to store them in the struct.

A few inline functions have been added to abstract the retrieval
and assignment of the pointers and flags, resulting in very few
changes. That way there is no more dependence on the number of
stick-counters and their position in the session flags.
2014-01-28 23:34:45 +01:00
Willy Tarreau
bb519c7cd1 MINOR: tools: add very basic support for composite pointers
Very often we want to associate one or two flags to a pointer, to
put a type on it or whatever. This patch provides this in standard.h
in the form of a few inline functions which combine a void * pointer
with an int and return an unsigned long called a composite address.
The functions allow to individuall set, retrieve both the pointer and
the flags. This is very similar to what is used in ebtree in fact.
2014-01-28 23:34:45 +01:00
Willy Tarreau
f3338349ec BUG/MEDIUM: counters: flush content counters after each request
One year ago, commit 5d5b5d8 ("MEDIUM: proto_tcp: add support for tracking
L7 information") brought support for tracking L7 information in tcp-request
content rules. Two years earlier, commit 0a4838c ("[MEDIUM] session-counters:
correctly unbind the counters tracked by the backend") used to flush the
backend counters after processing a request.

While that earliest patch was correct at the time, it became wrong after
the second patch was merged. The code does what it says, but the concept
is flawed. "TCP request content" rules are evaluated for each HTTP request
over a single connection. So if such a rule in the frontend decides to
track any L7 information or to track L4 information when an L7 condition
matches, then it is applied to all requests over the same connection even
if they don't match. This means that a rule such as :

     tcp-request content track-sc0 src if { path /index.html }

will count one request for index.html, and another one for each of the
objects present on this page that are fetched over the same connection
which sent the initial matching request.

Worse, it is possible to make the code do stupid things by using multiple
counters:

     tcp-request content track-sc0 src if { path /foo }
     tcp-request content track-sc1 src if { path /bar }

Just sending two requests first, one with /foo, one with /bar, shows
twice the number of requests for all subsequent requests. Just because
both of them persist after the end of the request.

So the decision to flush backend-tracked counters was not the correct
one. In practice, what is important is to flush countent-based rules
since they are the ones evaluated for each request.

Doing so requires new flags in the session however, to keep track of
which stick-counter was tracked by what ruleset. A later change might
make this easier to maintain over time.

This bug is 1.5-specific, no backport to stable is needed.
2014-01-28 21:40:28 +01:00
Willy Tarreau
12833bbca5 MINOR: cli: add the new "show pools" command
show pools
  Dump the status of internal memory pools. This is useful to track memory
  usage when suspecting a memory leak for example. It does exactly the same
  as the SIGQUIT when running in foreground except that it does not flush
  the pools.
2014-01-28 16:50:35 +01:00
Willy Tarreau
91b843d0d2 REORG: stats: move the stats socket states to dumpstats.c
There is no more usage of these values outside of dumpstats.c, and
they're easier to maintain there. Also replace the #defines with an
enum.
2014-01-28 16:28:21 +01:00
Willy Tarreau
e43d5323c6 MEDIUM: listener: apply a limit on the session rate submitted to SSL
Just like the previous commit, we sometimes want to limit the rate of
incoming SSL connections. While it can be done for a frontend, it was
not possible for a whole process, which makes sense when multiple
processes are running on a system to server multiple customers.

The new global "maxsslrate" setting is usable to fix a limit on the
session rate going to the SSL frontends. The limits applies before
the SSL handshake and not after, so that it saves the SSL stack from
expensive key computations that would finally be aborted before being
accounted for.

The same setting may be changed at run time on the CLI using
"set rate-limit ssl-session global".
2014-01-28 15:50:10 +01:00
Willy Tarreau
93e7c006c1 MEDIUM: listener: add support for limiting the session rate in addition to the connection rate
It's sometimes useful to be able to limit the connection rate on a machine
running many haproxy instances (eg: per customer) but it removes the ability
for that machine to defend itself against a DoS. Thus, better also provide a
limit on the session rate, which does not include the connections rejected by
"tcp-request connection" rules. This permits to have much higher limits on
the connection rate without having to raise the session rate limit to insane
values.

The limit can be changed on the CLI using "set rate-limit sessions global",
or in the global section using "maxsessrate".
2014-01-28 15:49:27 +01:00
Willy Tarreau
71b734c307 MINOR: cli: add more information to the "show info" output
In addition to previous outputs, we also emit the cumulated number of
connections, the cumulated number of requests, the maximum allowed
SSL connection concurrency, the current number of SSL connections and
the cumulated number of SSL connections. This will help troubleshoot
systems which experience memory shortage due to SSL.
2014-01-28 15:19:44 +01:00
Willy Tarreau
25002d206b MINOR: polling: create function fd_compute_new_polled_status()
This function is used to compute the new polling state based on
the previous state. All pollers have to do this in their update
loop, so better centralize the logic for it.
2014-01-26 00:42:32 +01:00
Willy Tarreau
e852545594 MEDIUM: polling: centralize polled events processing
Currently, each poll loop handles the polled events the same way,
resulting in a lot of duplicated, complex code. Additionally, epoll
was the only one to handle newly created FDs immediately.

So instead, let's move that code to fd.c in a new function dedicated
to this task : fd_process_polled_events(). All pollers now use this
function.
2014-01-26 00:42:32 +01:00
Willy Tarreau
6c11bd2f89 OPTIM: raw-sock: don't speculate after a short read if polling is enabled
This is the reimplementation of the "done" action : when we experience
a short read, we're almost certain that we've exhausted the system's
buffers and that we'll meet an EAGAIN if we attempt to read again. If
the FD is not yet polled, the stream interface already takes care of
stopping the speculative read. When the FD is already being polled, we
have two options :
  - either we're running from a level-triggered poller, in which case
    we'd rather report that we've reached the end so that we don't
    speculate over the poller and let it report next time data are
    available ;

  - or we're running from an edge-triggered poller in which case we
    have no choice and have to see the EAGAIN to re-enable events.

At the moment we don't have any edge-triggered poller, so it's desirable
to avoid speculative I/O that we know will fail.

Note that this must not be ported to SSL since SSL hides the real
readiness of the file descriptor.

Thanks to this change, we observe no EAGAIN anymore during keep-alive
transfers, and failed recvfrom() are reduced by half in http-server-close
mode (the client-facing side is always being polled and the second recv
can be avoided). Doing so results in about 5% performance increase in
keep-alive mode. Similarly, we used to have up to about 1.6% of EAGAIN
on accept() (1/maxaccept), and these have completely disappeared under
high loads.
2014-01-26 00:42:32 +01:00
Willy Tarreau
baf5b9b445 CLEANUP: connection: fix comments in connection.h to reflect new behaviour.
The polling has substantially changed, better fix the comments.
2014-01-26 00:42:31 +01:00
Willy Tarreau
aad69387ac CLEANUP: connection: use conn_xprt_ready() instead of checking the flag
It's easier and safer to rely on conn_xprt_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !xprt have been removed, as the
XPRT_READY flag itself guarantees xprt is set.
2014-01-26 00:42:31 +01:00
Willy Tarreau
3c72872da1 CLEANUP: connection: use conn_ctrl_ready() instead of checking the flag
It's easier and safer to rely on conn_ctrl_ready() everywhere than to
check the flag itself. It will also simplify adding extra checks later
if needed. Some useless controls for !ctrl have been removed, as the
CTRL_READY flag itself guarantees ctrl is set.
2014-01-26 00:42:31 +01:00
Willy Tarreau
e1f50c4b02 MEDIUM: connection: remove conn_{data,sock}_poll_{recv,send}
We simply remove these functions and replace their calls with the
appropriate ones :

  - if we're in the data phase, we can simply report wait on the FD
  - if we're in the socket phase, we may also have to signal the
    desire to read/write on the socket because it might not be
    active yet.
2014-01-26 00:42:30 +01:00
Willy Tarreau
310987a038 MAJOR: connection: remove the CO_FL_WAIT_{RD,WR} flags
These flags were used to report the readiness of the file descriptor.
Now this readiness is directly checked at the file descriptor itself.
This removes the need for constantly synchronizing updates between the
file descriptor and the connection and ensures that all layers share
the same level of information.

For now, the readiness is updated in conn_{sock,data}_poll_* by directly
touching the file descriptor. This must move to the lower layers instead
so that these functions can disappear as well. In this state, the change
works but is incomplete. It's sensible enough to avoid making it more
complex.

Now the sock/data updates become much simpler because they just have to
enable/disable access to a file descriptor and not to care anymore about
its readiness.
2014-01-26 00:42:30 +01:00
Willy Tarreau
f817e9f473 MAJOR: polling: rework the whole polling system
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.

This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.

Several new state manipulation functions have been introduced or
adapted :
  - fd_want_{recv,send} : enable receiving/sending on the FD regardless
    of its state (sets the ACTIVE flag) ;

  - fd_stop_{recv,send} : stop receiving/sending on the FD regardless
    of its state (clears the ACTIVE flag) ;

  - fd_cant_{recv,send} : report a failure to receive/send on the FD
    corresponding to EAGAIN (clears the READY flag) ;

  - fd_may_{recv,send}  : report the ability to receive/send on the FD
    as reported by poll() (sets the READY flag) ;

Some functions are used to report the current FD status :

  - fd_{recv,send}_active
  - fd_{recv,send}_ready
  - fd_{recv,send}_polled

Some functions were removed :
  - fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()

The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.

In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.

The following pollers have been updated :

   ev_select() : done, built, tested on Linux 3.10
   ev_poll()   : done, built, tested on Linux 3.10
   ev_epoll()  : done, built, tested on Linux 3.10 & 3.13
   ev_kqueue() : done, built, tested on OpenBSD 5.2
2014-01-26 00:42:30 +01:00
Willy Tarreau
033cd9d78c REORG: polling: rename "fd_process_spec_events()" to "fd_process_cached_events()"
This is in order to be coherent with the rest.
2014-01-26 00:42:29 +01:00
Willy Tarreau
899d95757e REORG: polling: rename the cache allocation functions
- alloc_spec_entry() becomes fd_alloc_cache_entry()
- release_spec_entry() becomes fd_release_cache_entry()
2014-01-26 00:42:29 +01:00
Willy Tarreau
16f649c82c REORG: polling: rename "fd_spec" to "fd_cache"
So fd_spec was renamed "fd_cache" as it's becoming an event cache, and
fd_nbspec becomes fd_cache_num.
2014-01-26 00:42:29 +01:00
Willy Tarreau
15a4dec87e REORG: polling: rename "spec_e" to "state" and "spec_p" to "cache"
We're completely changing the way FDs will be polled. There will be no
more speculative I/O since we'll know the exact FD state, so these will
only be cached events.

First, let's fix a few field names which become confusing. "spec_e" was
used to store a speculative I/O event state. Now we'll store the whole
R/W states for the FD there. "spec_p" was used to store a speculative
I/O cache position. Now let's clearly call it "cache".
2014-01-26 00:42:29 +01:00
Willy Tarreau
69a41fa8a3 CLEANUP: polling: rename "spec_e" to "state"
We're completely changing the way FDs will be polled. First, let's fix
a few field names which become confusing. "spec_e" was used to store a
speculative I/O event state. Now we'll store the whole R/W states for
the FD there.
2014-01-26 00:42:28 +01:00
Willy Tarreau
1f0da2485e BUG/MEDIUM: unique_id: HTTP request counter is not stable
Patrick Hemmer reported that using unique_id_format and logs did not
report the same unique ID counter since commit 9f09521 ("BUG/MEDIUM:
unique_id: HTTP request counter must be unique!"). This is because
the increment was done while producing the log message, so it was
performed twice.

A better solution consists in fetching a new value once per request
and saving it in the request or session context for all of this
request's life.

It happens that sessions already have a unique ID field which is used
for debugging and reporting errors, and which differs from the one
sent in logs and unique_id header.

So let's change this to reuse this field to have coherent IDs everywhere.
As of now, a session gets a new unique ID once it is instanciated. This
means that TCP sessions will also benefit from a unique ID that can be
logged. And this ID is renewed for each extra HTTP request received on
an existing session. Thus, all TCP sessions and HTTP requests will have
distinct IDs that will be stable along all their life, and coherent
between all places where they're used (logs, unique_id header,
"show sess", "show errors").

This feature is 1.5-specific, no backport to 1.4 is needed.
2014-01-25 11:07:06 +01:00
Willy Tarreau
45b34e8abc MINOR: connection: add more error codes to report connection errors
It is quite often that an connection error only reports "socket error" with
no more information. This is especially problematic with health checks where
many causes are possible, including resource exhaustion which do not lead to
a valid errno code. So let's add explicit codes to cover these cases.
2014-01-24 16:15:04 +01:00
Thierry FOURNIER
e7ba23633b MINOR: pattern: move functions for grouping pat_match_* and pat_parse_* and add documentation. 2014-01-21 22:14:21 +01:00
Willy Tarreau
2aefad5df7 MINOR: connection: add a new conn_drain() function
Till now there was no way to know from a connection if a previous
call to drain() had done any change. This function is used to drain
incoming data and to update the connection's flags at the same time.
It also correctly sets the polling flags on the connection if the
drain function indicates inability to receive. This function will
be used preferably over ctrl->drain() when a connection is used.
2014-01-20 22:27:16 +01:00
Willy Tarreau
2317976daa BUILD: listener: fix recent accept4() again
Recent commit 4448925 ("BUILD/MINOR: listener: remove a glibc warning on accept4()")
broke accept4() on some systems because the glibc's version may now conflict with
the local one.
2014-01-15 16:45:17 +01:00
Willy Tarreau
8663105095 BUG: Revert "OPTIM: poll: restore polling after a poll/stop/want sequence"
This reverts commit 1208266356.

It randomly breaks SSL. What happens is that if the SSL response is
read at once by the SSL stack and is partially delivered to the buffer,
then there's no way to read the next parts because we wait for some
polling first.

So we'll fix this after the polling rework.
2014-01-13 11:34:42 +01:00
Willy Tarreau
9fe7aae6eb MINOR: checks: use an inline function for health_adjust()
This function is called twice per request, and does almost always nothing.
Better use an inline version to avoid entering it when we can.

About 0.5% additional performance was gained this way.
2013-12-31 23:47:37 +01:00
Willy Tarreau
9e5a3aacf4 MEDIUM: stream-int: make si_connect() return an established state when possible
si_connect() used to only return SI_ST_CON. But it already detect the
connection reuse and is the function which avoids calling connect().
So it already knows the connection is valid and reuse. Thus we make it
return SI_ST_EST when a connection is reused. This means that
connect_server() can return this state and sess_update_stream_int()
as well.

Thanks to this change, we don't need to leave process_session() in
SI_ST_CON state to immediately enter it again to switch to SI_ST_EST.
Implementing this removes one call to process_session() per request
in keep-alive mode. We're now at 2 calls per request, which is the
minimum (one for the request and another one for the response). The
number of calls to http_wait_for_response() has also dropped from 2
to one.

Tests indicate a performance gain of about 2.6% in request rate in
keep-alive mode. There should be no gain in http-server-close() since
we don't use this faster path.
2013-12-31 23:32:12 +01:00
Willy Tarreau
d7ad9f5b0d MAJOR: channel: add a new flag CF_WAKE_WRITE to notify the task of writes
Since commit 6b66f3e ([MAJOR] implement autonomous inter-socket forwarding)
introduced in 1.3.16-rc1, we've been relying on a stupid mechanism to wake
up the task after a write, which was an exact copy-paste of the reader side.

The principle was that if we empty a buffer and there's no forwarding
scheduled or if the *producer* is not in a connected state, then we wake
the task up.

That does not make any sense. It happens to wake up too late sometimes (eg,
when the request analyser waits for some room in the buffer to start to
work), and leads to unneeded wakeups in client-side keep-alive, because
the task is woken up when the response is sent, while the analysers are
simply waiting for a new request.

In order to fix this, we introduce a new channel flag : CF_WAKE_WRITE. It
is designed so that an analyser can explicitly request being notified when
some data were written. It is used only when the HTTP request or response
analysers need to wait for more room in the buffers. It is automatically
cleared upon wake up.

The flag is also automatically set by the functions which try to write into
a buffer from an applet when they fail (bi_putblk() etc...).

That allows us to remove the stupid condition above and avoid some wakeups.
In http-server-close and in http-keep-alive modes, this reduces from 4 to 3
the average number of wakeups per request, and increases the overall
performance by about 1.5%.
2013-12-31 18:37:36 +01:00
Willy Tarreau
51437d2c59 Revert "MEDIUM: stats: add support for HTTP keep-alive on the stats page"
This reverts commit f3221f99ac.

Igor reported some very strange breakage of his stats page which is
clearly caused by the chunking, though I don't see at first glance
what could be wrong. Better revert it for now.
2013-12-29 00:43:40 +01:00
Willy Tarreau
f3221f99ac MEDIUM: stats: add support for HTTP keep-alive on the stats page
In theory the principle is simple as we just need to send HTTP chunks
if the client is 1.1 compatible. In practice it's harder because we
have to append a CR LF after each block of data and we're never sure
to have the room for this. In order not to have to deal with this, we
instead send the CR LF prior to each chunk size. The only issue is for
the first chunk and for this reason we avoid to send the empty header
line when using chunked encoding.
2013-12-28 21:40:16 +01:00
Willy Tarreau
983eb31fd1 BUG/MINOR: channel: CHN_INFINITE_FORWARD must be unsigned
This value is stored as unsigned in chn->to_forward. Having it defined
as signed makes it impossible to pass channel_forward() a previously
saved value because the argument will be zero-extended during the
conversion to long long, while the test will be performed using sign
extension. There is no impact on existing code right now.
2013-12-28 21:33:37 +01:00
Willy Tarreau
1208266356 OPTIM: poll: restore polling after a poll/stop/want sequence
If a file descriptor is being polled, and it stopped (eg: buffer full
or end of response), then re-enabled, currently what happens is that
the polling is disabled, then the fd is enabled in speculative mode,
an I/O attempt is made, it loses (otherwise the FD would surely not
have been polled), and the polled is enabled again.

This is too bad, especially with HTTP keep-alive on the server side
where all operations are performed at once before going back to the
poll loop.

Now we improve the behaviour by ensuring that if an fd is still being
polled, when it's enabled after having been disabled, we re-enable the
polling. Doing so saves a number of syscalls and useless wakeups, and
results in a significant performance gain on HTTP keep-alive. A 11%
increase has been observed on the HTTP request rate in keep-alive
thanks to this.

It could be considered as a bug fix, but there was no harm with the
current behaviour, except extra syscalls.
2013-12-27 20:18:52 +01:00
Willy Tarreau
068621e4ad MINOR: http: try to stick to same server after status 401/407
In HTTP keep-alive mode, if we receive a 401, we still have a chance
of being able to send the visitor again to the same server over the
same connection. This is required by some broken protocols such as
NTLM, and anyway whenever there is an opportunity for sending the
challenge to the proper place, it's better to do it (at least it
helps with debugging).
2013-12-23 15:12:44 +01:00
Willy Tarreau
2737562e43 MEDIUM: stream-int: implement a very simplistic idle connection manager
Idle connections are not monitored right now. So if a server closes after
a response without advertising it, it won't be detected until a next
request wants to use the connection. This is a bit problematic because
it unnecessarily maintains file descriptors and sockets in an idle
state.

This patch implements a very simple idle connection manager for the stream
interface. It presents itself as an I/O callback. The HTTP engine enables
it when it recycles a connection. If a close or an error is detected on the
underlying socket, it tries to drain as much data as possible from the socket,
detect the close and responds with a close as well, then detaches from the
stream interface.
2013-12-17 00:00:28 +01:00
Willy Tarreau
e38feed966 BUG/MINOR: stats: correctly report throttle rate of low weight servers
The throttling of low weight servers (<16) could mistakenly be reported
as > 100% due to a rounding that was performed before a multiply by 100
instead of after. This was introduced in 1.5-dev20 when fixing a previous
reporting issue by commit d32c399 (MINOR: stats: report correct throttling
percentage for servers in slowstart).

It should be backported if the patch above is backported.
2013-12-16 18:04:57 +01:00
Willy Tarreau
9420b1271d MINOR: http: add option prefer-last-server
When the load balancing algorithm in use is not deterministic, and a previous
request was sent to a server to which haproxy still holds a connection, it is
sometimes desirable that subsequent requests on a same session go to the same
server as much as possible. Note that this is different from persistence, as
we only indicate a preference which haproxy tries to apply without any form
of warranty. The real use is for keep-alive connections sent to servers. When
this option is used, haproxy will try to reuse the same connection that is
attached to the server instead of rebalancing to another server, causing a
close of the connection. This can make sense for static file servers. It does
not make much sense to use this in combination with hashing algorithms.
2013-12-16 02:23:54 +01:00
Willy Tarreau
b490b4e5ad MAJOR: stream-int: handle the connection reuse in si_connect()
This is the best place to reuse a connection. We centralize all
connection requests and we're at the best place to know exactly
what the current state of the underlying connection is. If the
connection is reused, we just enable polling for send() in order
to be able to emit the request.
2013-12-16 02:23:53 +01:00
Willy Tarreau
9471b8ced9 MEDIUM: connection: inform si_alloc_conn() whether existing conn is OK or not
When allocating a new connection, only the caller knows whether it's
acceptable to reuse the previous one or not. Let's pass this information
to si_alloc_conn() which will do the cleanup if the connection is not
acceptable.
2013-12-16 02:23:53 +01:00
Willy Tarreau
ad38acedaa MEDIUM: connection: centralize handling of nolinger in fd management
Right now we see many places doing their own setsockopt(SO_LINGER).
Better only do it just before the close() in fd_delete(). For this
we add a new flag on the file descriptor, indicating if it's safe or
not to linger. If not (eg: after a connect()), then the setsockopt()
call is automatically performed before a close().

The flag automatically turns to safe when receiving a read0.
2013-12-16 02:23:52 +01:00
Willy Tarreau
d02cdd23be MINOR: connection: add simple functions to report connection readiness
conn_xprt_ready() reports if the transport layer is ready.
conn_ctrl_ready() reports if the control layer is ready.

The stream interface uses si_conn_ready() to report that the
underlying connection is ready. This will be used for connection
reuse in keep-alive mode.
2013-12-16 02:23:52 +01:00
Willy Tarreau
3343432fcd MINOR: checks: add a flag to indicate what check is an agent
Currently to know if a check is an agent, we compare its pointer to its
servers' agent pointer. Better have a flag in its state to indicate this.
2013-12-14 16:02:20 +01:00
Willy Tarreau
33a08db932 MINOR: checks: add a PAUSED state for the checks
Health checks can now be paused. This is the status they get when the
server is put into maintenance mode, which is more logical than relying
on the server's state at some places. It will be needed to allow agent
checks to run when health checks are disabled (currently not possible).
2013-12-14 16:02:20 +01:00
Willy Tarreau
ff5ae35b9f MINOR: checks: use check->state instead of srv->state & SRV_CHECKED
Having the check state partially stored in the server doesn't help.
Some functions such as srv_getinter() rely on the server being checked
to decide what check frequency to use, instead of relying on the check
being configured. So let's get rid of SRV_CHECKED and SRV_AGENT_CHECKED
and only use the check's states instead.
2013-12-14 16:02:19 +01:00
Willy Tarreau
2e10f5a759 MINOR: checks: replace state DISABLED with CONFIGURED and ENABLED
At the moment, health checks and agent checks are tied : no agent
check is emitted if no health check is enabled. Other parameters
are considered in the condition for letting checks run. It will
help us selectively enable checks (agent and regular checks) to be
know whether they're enabled/disabled and configured or not. Now
we can already emit an error when trying to enable an unconfigured
agent.
2013-12-14 16:02:19 +01:00
Willy Tarreau
2c115e5047 MINOR: checks: rename the state flags
The flag CHK_STATE_RUNNING is misleading as one may believe it means
the state is enabled (just like SRV_RUNNING). Let's rename these two
flags CHK_ST_INPROGRESS and CHK_ST_DISABLED.
2013-12-14 16:02:19 +01:00
Willy Tarreau
6aaa1b87cf MINOR: checks: use an enum instead of flags to report a check result
We used to have up to 4 sets of flags which were almost all exclusive
to report a check result. And the names were inherited from the old
server states, adding to the confusion. Let's replace that with an
enum handling only the possible combinations :

   SRV_CHK_UNKNOWN                   => CHK_RES_UNKNOWN
   SRV_CHK_FAILED                    => CHK_RES_FAILED
   SRV_CHK_PASSED                    => CHK_RES_PASSED
   SRV_CHK_PASSED | SRV_CHK_DISABLE  => CHK_RES_CONDPASS
2013-12-14 16:02:19 +01:00
Willy Tarreau
8e85ad5211 REORG: checks: retrieve the check-specific defines from server.h to checks.h
After the move of checks from servers to autonomous checks, we need a
massive cleanup and reordering as it's becoming increasingly difficult
to find the definitions of types and enums.

Nothing was changed, blocks were just moved.
2013-12-14 16:02:18 +01:00
Willy Tarreau
1a53a3af13 MINOR: checks: improve handling of the servers tracking chain
Server tracking uses the same "tracknext" list for servers tracking
another one and for the servers being tracked. This caused an issue
which was fixed by commit f39c71c ([CRITICAL] fix server state tracking:
it was O(n!) instead of O(n)), consisting in ensuring that a server is
being checked before walking down the list, so that we don't propagate
the up/down information via servers being part of the track chain.

But the root cause is the fact that all servers share the same list.
The correct solution consists in having a list head for the tracked
servers and a list of next tracking servers. This simplifies the
propagation logic, especially for the case where status changes might
be passed to individual servers via the CLI.
2013-12-14 16:02:18 +01:00
Willy Tarreau
89efaed6b6 BUILD: definitely silence some stupid GCC warnings
It's becoming increasingly difficult to ignore unwanted function returns in
debug code with gcc. Now even when you try to work around it, it suggests a
way to write your code differently. For example :

    src/frontend.c:187:65: warning: if statement has empty body [-Wempty-body]
                if (write(1, trash.str, trash.len) < 0) /* shut gcc warning */;
                                                                              ^
    src/frontend.c:187:65: note: put the semicolon on a separate line to silence this warning
    1 warning generated.

This is totally unacceptable, this code already had to be written this way
to shut it up in earlier versions. And now it comments the form ? What's the
purpose of the C language if you can't write anymore the code that does what
you want ?

Emeric proposed to just keep a global variable to drain such useless results
so that gcc stops complaining all the time it believes people who write code
are monkeys. The solution is acceptable because the useless assignment is done
only in debug code so it will not impact performance. This patch implements
this, until gcc becomes even "smarter" to detect that we tried to cheat.
2013-12-13 15:21:36 +01:00
Willy Tarreau
5f3f15f618 BUILD: time: adapt the type of TV_ETERNITY to the local system
Some systems use different types for tv_sec/tv_usec, some are
signed others not. From time to time new warnings are reported
about implicit casts being done.

This patch ensures that TV_ETERNITY is cast to the appropriate
type in assignments and conversions.
2013-12-13 09:22:23 +01:00
Willy Tarreau
975c1784c8 MINOR: sample: make sample_parse_expr() use memprintf() to report parse errors
Doing so ensures that we're consistent between all the functions in the whole
chain. This is important so that we can extract the argument parsing from this
function.
2013-12-12 23:16:54 +01:00
Thierry FOURNIER
c0e0d7b7cf MEDIUM: map: dynamic manipulation of maps
This patch adds map manipulation commands to the socket interface.

add map <map> <key> <value>
  Add the value <value> in the map <map>, at the entry corresponding to
  the key <key>. This command does not verify if the entry already
  exists.

clear map <map>
  Remove entries from the map <map>

del map <map> <key>
  Delete all the map entries corresponding to the <key> value in the map
  <map>.

set map <map> <key> <value>
  Modify the value corresponding to each key <key> in a map <map>. The
  new value is <value>.

show map [<map>]
  Dump info about map converters. Without argument, the list of all
  available maps are returned. If a <map> is specified, is content is
  dumped.
2013-12-12 15:58:30 +01:00
Thierry FOURNIER
01cdcd4a62 MINOR: pattern: add function to lookup a specific entry in pattern list
This is used to dynamically delete or update map entry.
2013-12-12 15:50:01 +01:00
Thierry FOURNIER
b0c0a0f940 MINOR: map: export parse output sample functions
This export is used to identify the parser used
2013-12-12 15:44:05 +01:00
Thierry FOURNIER
7609064fc3 MINOR: pattern: make the pattern matching function return a pointer to the matched element
This feature will be used by the CLI to look up keys.
2013-12-12 15:44:05 +01:00
Thierry FOURNIER
0b2fe4a5cd MINOR: pattern: add support for compiling patterns for lookups
With this patch, patterns can be compiled for two modes :
  - match
  - lookup

The match mode is used for example in ACLs or maps. The lookup mode
is used to lookup a key for pattern maintenance. For example, looking
up a network is different from looking up one address belonging to
this network.

A special case is made for regex. In lookup mode they return the input
regex string and do not compile the regex.
2013-12-12 15:44:02 +01:00
Thierry FOURNIER
39e258fcee MINOR: regex: Copy the original regex expression into string.
This is useful for the debug or for search regex in maps.
2013-12-12 15:43:34 +01:00
Thierry FOURNIER
799c042daa MINOR: regex: Change the struct containing regex
This change permits to remove the typedef. The original regex structs
are set in haproxy's struct.
2013-12-12 15:42:58 +01:00
Thierry FOURNIER
7148ce6ef4 MEDIUM: pattern: Extract the index process from the pat_parse_*() functions
Now, the pat_parse_*() functions parses the incoming data. The input
"pattern" struct can be preallocated. If the parser needs to add some
buffers, it allocates memory.

The function pattern_register() runs the call to the parser, process
the key indexation and associate the "sample_storage" used by maps.
2013-12-12 15:42:11 +01:00
Thierry FOURNIER
e3ded59706 MEDIUM: acl: Last patch change the output type
This patch remove the compatibility check from the input type and the
match method. Now, it checks if a casts from the input type to output
type exists and the pattern_exec_match() function apply casts before
each pattern matching.
2013-12-12 15:42:11 +01:00
Thierry FOURNIER
cc0e0b3dbb MINOR: pattern: Each pattern sets the expected input type
This is used later for increasing the compability with incoming
sample types. When multiple compatible types are supported, one
is arbitrarily used (eg: UINT).
2013-12-12 11:07:33 +01:00
Thierry FOURNIER
2d4771ba17 MINOR: map: export map_get_reference() function
This function is used to identify map with his reference into the CLI
functions.
2013-12-11 22:05:03 +01:00
Willy Tarreau
9ba813cd69 CLEANUP: check: server port is unsigned
Baptiste Assmann reported some confusing printf() output of the server
port since it's declared signed. Better turn it to unsigned.

There's no need to backport this, it's only used in 16-bit places.
2013-12-10 23:32:30 +01:00
Willy Tarreau
2d400bb931 MINOR: stream_interface: add reporting of ressouce allocation errors
SSL and keep-alive will need to be able to fail on allocation errors,
and the stream interface did not allow to report such a cause. The flag
will then be "RC" as already documented.
2013-12-09 17:12:18 +01:00
Willy Tarreau
05efc0f33a DIET/MINOR: task: reduce struct task size by 8 bytes
Just by reordering the struct task, we could shrink it by 8 bytes from
120 to 112 bytes. A careful reordering allowed each part to be located
closer to the hot parts it's used with, resulting in another performance
increase of about 0.5%.
2013-12-09 16:06:22 +01:00
Willy Tarreau
5735d7e2a2 MINOR: http: use an enum for the auth method in http_auth_data
This method now takes a single byte, with 7 bytes left to be used
after it. No savings were gained but at least now we have an enum.
2013-12-09 16:06:22 +01:00
Willy Tarreau
3770f23a3a MINOR: http: switch the http state to an enum
This reduces its size which is not reused by anything else. However it
will significantly improve the debugger's output since we'll now get
real state values.

The default case had to be enabled in the parsers because gcc tries
to optimize the switch/case and noticed some values were missing from
the enums and emitted a warning.
2013-12-09 16:06:22 +01:00
Willy Tarreau
c8987b3664 DIET/MINOR: http: reduce the size of struct http_txn by 8 bytes
Here again we had some oversized and misaligned entries. The method
and the status don't need 4 bytes each, and there was a hole after
the status that does not exist anymore. That's 8 additional bytes
saved from http_txn and as much for the session.

Also some fields were slightly moved to present better memory access
patterns resulting in a steady 0.5% performance increase.
2013-12-09 16:06:22 +01:00
Willy Tarreau
721854f0ac DIET/MINOR: stream-int: rearrange a few fields in struct stream_interface to save 8 bytes
The current and previous states are now packed enums instead of ints. This will
also help in gdb. The flags have been turned to 16-bit instead of 32 since only
10 are used. This resulted in saving 8 bytes per streamm interface, or 16 per
session.
2013-12-09 16:06:21 +01:00
Willy Tarreau
2518db4bfa DIET/MINOR: session: reduce the struct session size by 8 bytes
Move uniq_id upper to fill a hole and kill one. Another hole remains
after store_count.
2013-12-09 16:06:21 +01:00
Willy Tarreau
8379c17adf DIET/MINOR: proxy: rearrange a few fields in struct proxy to save 16 bytes
Turn the proxy state to a packed enum (1 char), same for the proxy mode,
and store the capabitilies as a char. These 3 ints can now fill the hole
after obj_type and save 8 bytes in the proxy struct. Moving the maxconn
value just after, which is frequently accessed and was in a block of 3
ints saved another 8 bytes.
2013-12-09 16:06:21 +01:00
Willy Tarreau
f6502c5062 DIET/MINOR: listener: rearrange a few fields in struct listener to save 16 bytes
Pack the listener state to 1 char, store it as an enum instead of an
int (more gdb-friendly), and move a few fields around to fill holes.

The <nice> field can only be -1024..1024 so it was stored as a signed
short and completes well with obj_type and li_state.

Doing this has reduced the struct listener from 376 to 360 bytes (4.2%).
2013-12-09 16:06:21 +01:00
Willy Tarreau
ad5281ca04 DIET/MINOR: connection: rearrange a few fields to save 8 bytes in the struct
By moving the error code to 8 bits the send_proxy_ofs to 16 bits, and
moving them just after the obj_type, we can save 8 bytes in the struct
connection, down from 328 to 320.
2013-12-09 16:06:15 +01:00
Willy Tarreau
939478d04d DIET/MINOR: obj: pack the obj_type enum to 8 bits
Taking 32-bit in each struct just to store an obj_type is a waste
considering the very small amount of possible values. Let's force
it to be as small as possible (1 char) and we'll be able to move
some structs around to save some space.
2013-12-09 16:06:08 +01:00
Willy Tarreau
4171e9eef0 MEDIUM: stats: delay appctx initialization
Now that the session handler can automatically initialize the appctx,
let's not do it in stats_accept() anymore.
2013-12-09 15:40:23 +01:00
Willy Tarreau
0a23bcb8be MAJOR: stream-interface: dynamically allocate the applet context
From now on, a call to stream_int_register_handler() causes a call
to si_alloc_appctx() and returns an initialized appctx for the
current stream interface. If one was previously allocated, it is
released. If the stream interface was attached to a connection, it
is released as well.

The appctx are allocated from the same pools as the connections, because
they're substantially smaller in size, and we can't have both a connection
and an appctx on an interface at any moment.

In case of memory shortage, the call may return NULL, which is already
handled by all consumers of stream_int_register_handler().

The field appctx was removed from the stream interface since we only
rely on the endpoint now. On 32-bit, the stream_interface size went down
from 108 to 44 bytes. On 64-bit, it went down from 144 to 64 bytes. This
represents a memory saving of 160 bytes per session.

It seems that a later improvement could be to move the call to
stream_int_register_handler() to session.c for most cases.
2013-12-09 15:40:23 +01:00
Willy Tarreau
1fbe1c9ec8 MEDIUM: stream-int: return the allocated appctx in stream_int_register_handler()
The task returned by stream_int_register_handler() is never used, however we
always need to access the appctx afterwards. So make it return the appctx
instead. We already plan for it to fail, which is the reason for the addition
of a few tests and the possibility for the HTTP analyser to return a status
code 500.
2013-12-09 15:40:23 +01:00
Willy Tarreau
7b4b499fde MEDIUM: stream-int: replace occurrences of si->appctx with si_appctx()
We're about to remove si->appctx, so first let's replace all occurrences
of its usage with a dynamic extract from si->end. A lot of code was changed
by search-n-replace, but the behaviour was intentionally not altered.

The code surrounding calls to stream_int_register_handler() was slightly
changed since we can only use si->end *after* the registration.
2013-12-09 15:40:23 +01:00
Willy Tarreau
57cd3e46b9 MEDIUM: connection: merge the send_proxy and local_send_proxy calls
We used to have two very similar functions for sending a PROXY protocol
line header. The reason is that the default one relies on the stream
interface to retrieve the other end's address, while the "local" one
performs a local address lookup and sends that instead (used by health
checks).

Now that the send_proxy_ofs is stored in the connection and not the
stream interface, we can make the local_send_proxy rely on it and
support partial sends. This also simplifies the code by removing the
local_send_proxy function, making health checks use send_proxy_ofs,
resulting in the removal of the CO_FL_LOCAL_SPROXY flag, and the
associated test in the connection handler. The other flag,
CO_FL_SI_SEND_PROXY was renamed without the "SI" part so that it
is clear that it is not dedicated anymore to a usage with a stream
interface.
2013-12-09 15:40:23 +01:00
Willy Tarreau
1ec74bf660 MINOR: connection: check for send_proxy during the connect(), not the SI
It's cleaner to check for a pending send_proxy_ofs while establishing
the connection (which already checks it anyway) and not in the stream
interface.
2013-12-09 15:40:23 +01:00
Willy Tarreau
b8020cefed MEDIUM: connection: move the send_proxy offset to the connection
Till now the send_proxy_ofs field remained in the stream interface,
but since the dynamic allocation of the connection, it makes a lot
of sense to move that into the connection instead of the stream
interface, since it will not be statically allocated for each
session.

Also, it turns out that moving it to the connection fils an alignment
hole on 64 bit architectures so it does not consume more memory, and
removing it from the stream interface was an opportunity to correctly
reorder fields and reduce the stream interface's size from 160 to 144
bytes (-10%). This is 32 bytes saved per session.
2013-12-09 15:40:23 +01:00
Willy Tarreau
32e3c6a607 MAJOR: stream interface: dynamically allocate the outgoing connection
The outgoing connection is now allocated dynamically upon the first attempt
to touch the connection's source or destination address. If this allocation
fails, we fail on SN_ERR_RESOURCE.

As we didn't use si->conn anymore, it was removed. The endpoints are released
upon session_free(), on the error path, and upon a new transaction. That way
we are able to carry the existing server's address across retries.

The stream interfaces are not initialized anymore before session_complete(),
so we could even think about allocating them dynamically as well, though
that would not provide much savings.

The session initialization now makes use of conn_new()/conn_free(). This
slightly simplifies the code and makes it more logical. The connection
initialization code is now shorter by about 120 bytes because it's done
at once, allowing the compiler to remove all redundant initializations.

The si_attach_applet() function now takes care of first detaching the
existing endpoint, and it is called from stream_int_register_handler(),
so we can safely remove the calls to si_release_endpoint() in the
application code around this call.

A call to si_detach() was made upon stream_int_unregister_handler() to
ensure we always free the allocated connection if one was allocated in
parallel to setting an applet (eg: detect HTTP proxy while proceeding
with stats maybe).
2013-12-09 15:40:23 +01:00
Willy Tarreau
2a6e8802c0 MEDIUM: stream-interface: introduce si_attach_conn to replace si_prepare_conn
si_prepare_conn() is not appropriate in our case as it both initializes and
attaches the connection to the stream interface. Due to the asymmetry between
accept() and connect(), it causes some fields such as the control and transport
layers to be reinitialized.

Now that we can separately initialize these fields using conn_prepare(), let's
break this function to only attach the connection to the stream interface.

Also, by analogy, si_prepare_none() was renamed si_detach(), and
si_prepare_applet() was renamed si_attach_applet().
2013-12-09 15:40:23 +01:00
Willy Tarreau
7abddb5c67 MINOR: connection: replace conn_assign with conn_attach
We don't want to assign the control nor transport layers anymore
at the same time as the data layer, because it prevents one from
keeping existing settings when reattaching a connection to an
existing stream interface.

Let's have conn_attach() replace conn_assign() for this purpose.

Thus, conn_prepare() + conn_attach() do exactly the same as the
previous conn_assign().
2013-12-09 15:40:23 +01:00
Willy Tarreau
910c6aa5b7 MINOR: connection: reintroduce conn_prepare to set the protocol and transport
Now that we can assign conn->xprt regardless of the initialization state,
we can reintroduce conn_prepare() to set only the protocol, the transport
layer and initialize the transport layer's state.
2013-12-09 15:40:23 +01:00
Willy Tarreau
3ed35ef05b MINOR: stream-interface: introduce si_reset() and si_set_state()
The first function is used to (re)initialize a stream interface and
the second to force it into a known state. These are intended for
cleaning up the stream interface initialization code in session.c
and peers.c and avoiding future issues with missing initializations.
2013-12-09 15:40:23 +01:00
Willy Tarreau
f79c8171b2 MAJOR: connection: add two new flags to indicate readiness of control/transport
Currently the control and transport layers of a connection are supposed
to be initialized when their respective pointers are not NULL. This will
not work anymore when we plan to reuse connections, because there is an
asymmetry between the accept() side and the connect() side :

  - on accept() side, the fd is set first, then the ctrl layer then the
    transport layer ; upon error, they must be undone in the reverse order,
    then the FD must be closed. The FD must not be deleted if the control
    layer was not yet initialized ;

  - on the connect() side, the fd is set last and there is no reliable way
    to know if it has been initialized or not. In practice it's initialized
    to -1 first but this is hackish and supposes that local FDs only will
    be used forever. Also, there are even less solutions for keeping trace
    of the transport layer's state.

Also it is possible to support delayed close() when something (eg: logs)
tracks some information requiring the transport and/or control layers,
making it even more difficult to clean them.

So the proposed solution is to add two flags to the connection :

  - CO_FL_CTRL_READY is set when the control layer is initialized (fd_insert)
    and cleared after it's released (fd_delete).

  - CO_FL_XPRT_READY is set when the control layer is initialized (xprt->init)
    and cleared after it's released (xprt->close).

The functions have been adapted to rely on this and not on the pointers
anymore. conn_xprt_close() was unused and dangerous : it did not close
the control layer (eg: the socket itself) but still marks the transport
layer as closed, preventing any future call to conn_full_close() from
finishing the job.

The problem comes from conn_full_close() in fact. It needs to close the
xprt and ctrl layers independantly. After that we're still having an issue :
we don't know based on ->ctrl alone whether the fd was registered or not.
For this we use the two new flags CO_FL_XPRT_READY and CO_FL_CTRL_READY. We
now rely on this and not on conn->xprt nor conn->ctrl anymore to decide what
remains to be done on the connection.

In order not to miss some flag assignments, we introduce conn_ctrl_init()
to initialize the control layer, register the fd using fd_insert() and set
the flag, and conn_ctrl_close() which unregisters the fd and removes the
flag, but only if the transport layer was closed.

Similarly, at the transport layer, conn_xprt_init() calls ->init and sets
the flag, while conn_xprt_close() checks the flag, calls ->close and clears
the flag, regardless xprt_ctx or xprt_st. This also ensures that the ->init
and the ->close functions are called only once each and in the correct order.
Note that conn_xprt_close() does nothing if the transport layer is still
tracked.

conn_full_close() now simply calls conn_xprt_close() then conn_full_close()
in turn, which do nothing if CO_FL_XPRT_TRACKED is set.

In order to handle the error path, we also provide conn_force_close() which
ignores CO_FL_XPRT_TRACKED and closes the transport and the control layers
in turns. All relevant instances of fd_delete() have been replaced with
conn_force_close(). Now we always know what state the connection is in and
we can expect to split its initialization.
2013-12-09 15:40:23 +01:00
Willy Tarreau
b97f3b1abf MINOR: connection: add conn_new() / conn_free()
conn_new() will be a more convenient way of allocating and initializing
a connection. It calls pool_alloc2() and conn_init() upon success.

conn_free() is just a pool_free2() but is provided for symmetry with
conn_new().
2013-12-09 15:40:23 +01:00
Willy Tarreau
c10aec299f MINOR: get rid of si_takeover_conn()
Since last commit, this function is an exact copy of si_prepare_conn().
2013-12-09 15:40:23 +01:00
Willy Tarreau
37213433a8 MEDIUM: connection: replace conn_prepare with conn_assign
Everywhere conn_prepare() is used, the call to conn_init() has already
been done. We can now safely replace all instances of conn_prepare()
with conn_assign() which does not reset the transport layer, and remove
conn_prepare().
2013-12-09 15:40:23 +01:00
Willy Tarreau
d015577428 MINOR: connection: add conn_init() to (re)initialize a connection
This function will ease the initialization of new connections as well
as their reuse. It initializes the obj_type and a few fields so that
the connection is fresh again. It leaves the addresses and target
untouched so it is suitable for use across connection retries.
2013-12-09 15:40:23 +01:00
Willy Tarreau
f8a49eab4f MEDIUM: session: attach incoming connection to target on embryonic sessions
In order to reduce the dependency over stream-interfaces, we now
attach the incoming connection to the embryonic session's target
instead of the stream-interface's connection. This means we won't
need to initialize stream interfaces anymore after we implement
dynamic connection allocation. The session's target is reset to
NULL after the session has been converted to a complete session.
2013-12-09 15:40:22 +01:00
Willy Tarreau
b363a1f469 MAJOR: stream-int: stop using si->conn and use si->end instead
The connection will only remain there as a pre-allocated entity whose
goal is to be placed in ->end when establishing an outgoing connection.
All connection initialization can be made on this connection, but all
information retrieved should be applied to the end point only.

This change is huge because there were many users of si->conn. Now the
only users are those who initialize the new connection. The difficulty
appears in a few places such as backend.c, proto_http.c, peers.c where
si->conn is used to hold the connection's target address before assigning
the connection to the stream interface. This is why we have to keep
si->conn for now. A future improvement might consist in dynamically
allocating the connection when it is needed.
2013-12-09 15:40:22 +01:00
Willy Tarreau
691b1f429e CLEANUP: stream-int: remove obsolete si_ctrl function
This function makes no sense anymore and will cause trouble to convert
the remains of connection/applet to end points. Let's replace it now
with its contents.
2013-12-09 15:40:22 +01:00
Willy Tarreau
cf644ed37a MEDIUM: stream-int: make ->end point to the connection or the appctx
The long-term goal is to have a context for applets as an alternative
to the connection and not as a complement. At the moment, the context
is still stored into the stream interface, and we only put a pointer
to the applet's context in si->end, initialize the context with object
type OBJ_TYPE_APPCTX, and this allows us not to allocate an entry when
deciding to switch to an applet.

A special care is taken to never dereference si->conn anymore when
dealing with an applet. That's why it's important that si->end is
always set to the proper type :

    si->end == NULL             => not connected to anything
   *si->end == OBJ_TYPE_APPCTX  => connected to an applet
   *si->end == OBJ_TYPE_CONN    => real connection (server, proxy, ...)

The session management code used to check the applet from the connection's
target. Now it uses the stream interface's end point and does not touch the
connection at all. Similarly, we stop checking the connection's addresses
and file descriptors when reporting the applet's status in the stats dump.
2013-12-09 15:40:22 +01:00
Willy Tarreau
4a59f2f954 MAJOR: stream interface: remove the ->release function pointer
Since last commit, we now have a pointer to the applet in the
applet context. So we don't need the si->release function pointer
anymore, it can be extracted from applet->applet.release. At many
places, the ->release function was still tested for real connections
while it is only limited to applets, so most of them were simply
removed. For the remaining valid uses, a new inline function
si_applet_release() was added to simplify the check and the call.
2013-12-09 15:40:22 +01:00
Willy Tarreau
48099c7a07 MEDIUM: stream-interface: set the pointer to the applet into the applet context
In preparation for a later move of all the applet context outside of the
stream interface, we'll need to have access to the applet itself from the
context. Let's have a pointer to it inside the context.
2013-12-09 15:40:22 +01:00
Willy Tarreau
7d67d7b9e5 MINOR: stream-int: add a new pointer to the end point
The end point will correspond to either an applet context or a connection,
depending on the object type. For now the pointer remains null.
2013-12-09 15:40:22 +01:00
Willy Tarreau
372d6708fb MINOR: stream-int: split si_prepare_embedded into si_prepare_none and si_prepare_applet
si_prepare_embedded() was used both to attach an applet and to detach
anything from a stream interface. Split it into si_prepare_none() to
detach and si_prepare_applet() to attach an applet.

si->conn->target is now assigned from within these two functions instead
of their respective callers.
2013-12-09 15:40:22 +01:00
Willy Tarreau
9b6c2c721e MINOR: stream-int: rename ->applet to ->appctx
Since this is the applet context, call it ->appctx to avoid the confusion
with the pointer to the applet. Many places were changed but it's only a
renaming.
2013-12-09 15:40:22 +01:00
Willy Tarreau
0788f47cc1 MINOR: obj: introduce a new type appctx
The object type was added to "struct appctx". The purpose will be
to identify an appctx when the applet context is detached from the
stream interface. For now, it's still attached, so this patch only
adds the new type and does not replace its use.
2013-12-09 15:40:22 +01:00
Willy Tarreau
452d3bb0c4 MINOR: stream-interface: move the applet context to its own struct
In preparation of making the applet context dynamically allocatable,
we create a "struct appctx". Nothing else was changed, it's the same
struct as the one which was in the stream interface.
2013-12-09 15:40:22 +01:00
Willy Tarreau
f4acee332b MEDIUM: stream interface: move the peers' ptr into the applet context
A long time ago when peers were introduced, there was no applet nor
applet context. Applet contexts were introduced but the peers still
did not make use of them and the "ptr" pointer remains present in
every stream interface in addition to the other contexts.

Simply move this pointer to its own location in the context.

Note that this pointer is still a void* because its type and contents
varies depending on the peers session state. Probably that this could
be cleaned up in the future given that all other contexts already store
much more than a single pointer.
2013-12-09 15:40:22 +01:00
Willy Tarreau
51c2184755 MINOR: connection: add a field to store an object type
This will soon be used to differenciate connections from applet
contexts. Object type "connection" has also been added.
2013-12-09 15:40:22 +01:00
Willy Tarreau
66337a0784 MINOR: obj: provide a safe and an unsafe access to pointed objects
Most of the times, the caller of objt_<type>(ptr) will know that <ptr>
is valid and of the correct type (eg: in an "if" condition). Let's provide
an unsafe variant that does not perform the check again for these usages.
The new functions are called "__objt_<type>".
2013-12-09 15:40:22 +01:00
Willy Tarreau
6fe1541285 MINOR: stream-int: make the shutr/shutw functions void
This is to be more consistent with the other functions. The only
reason why these functions used to return a value was to let the
caller adjust polling by itself, but now their only callers were
the si_shutr()/si_shutw() inline functions. Now these functions
do not depend anymore on the connection.

These connection variant of these functions now call
conn_data_stop_recv()/conn_data_stop_send() before returning order
not to require a return code anymore. The applet version does not
need this at all.
2013-12-09 15:40:22 +01:00
Willy Tarreau
8b3d7dfd7c MEDIUM: stream-int: split the shutr/shutw functions between applet and conn
These functions induce a lot of ifs everywhere because they consider two
different cases, one which is where the connection exists and has a file
descriptor, and the other one which is the default case where at most an
applet has to be notified.

Let's have them in si_ops and automatically decide which one to use.

The connection shutdown sequence has been slightly simplified, and we
now clear the flags at the end.

Also we remove SHUTR_NOW after a shutw with nolinger, as it's cleaner
not to keep it.
2013-12-09 15:40:22 +01:00
Willy Tarreau
347a35d19e MAJOR: stats: move the HTTP stats handling to its applet
There is a big trouble with the way POST is handled for the admin
stats page. The POST parameters are extracted from some http-request
rules, and if not round they return zero hoping for being called again
when more data passes. This results in the HTTP analyser being called
several times and all the rules prior to the stats being executed
multiple times as well. That includes rewrite rules.

So instead of doing this, we now move all the processing of the stats
into the stats applet.

That way we just set the stats applet in the HTTP analyser when a stats
request is detected, and the applet takes the time it needs to read the
arguments and respond. We could even imagine improving the applet to
support requests larger than a single buffer.

The code was almost only moved and minimally changed. Several new HTTP
states were added to the stats applet to emit headers, redirects and
to read POST. It was necessary to do this because the headers sent
depend on the parsing of the POST request. In the end it's beneficial
because we removed two stream_int_retnclose() calls.
2013-12-09 15:40:22 +01:00
Willy Tarreau
96d44918f7 MEDIUM: stats: prepare the HTTP stats I/O handler to support more states
In preparation for moving the POST processing to the applet, we first
add new states to the HTTP I/O handler. Till now st0 was only 0/1 for
start/end. We now replace it with an enum.
2013-12-09 15:40:22 +01:00
Willy Tarreau
9f68148321 MEDIUM: peers: don't rely on conn->xprt_ctx anymore
We make the peers code use applet->ptr instead of conn->xprt_ctx to
store the pointer to the current peer. That way it does not depend
on a connection anymore.
2013-12-09 15:40:21 +01:00
Willy Tarreau
787add2932 MINOR: session: add a simple function to retrieve a session from a task
This function only casts t->context to (struct session *). It will
avoid some ugly and unsafe casts in upcoming changes.
2013-12-09 15:40:21 +01:00
Willy Tarreau
a94d2d7653 MEDIUM: stats: don't use conn->xprt_st anymore
We're trying to move the applets out of the struct connection. So
let's remove the dependence on xprt_st and introduce si->applet.st2
to store the missing contextual data instead.
2013-12-09 15:40:21 +01:00
Willy Tarreau
08382955fe CLEANUP: stream_interface: remove unused field err_loc
This field was still fed with a pointer to the server that caught an
error but was not used anymore. Let's remove it.
2013-12-09 15:40:21 +01:00
Willy Tarreau
37e340ce4b BUG/MEDIUM: stick: completely remove the unused flag from the store entries
The store[] array in the session holds a flag which probably aimed to
differenciate store entries learned from the request from those learned
from the response, and allowing responses to overwrite only the request
ones (eg: have a server set a response cookie which overwrites the request
one).

But this flag is set when a response data is stored, and is never cleared.
So in practice, haproxy always runs with this flag set, meaning that
responses prevent themselves from overriding the request data.

It is desirable anyway to keep the ability not to override data, because
the override is performed only based on the table and not on the key, so
that would mean that it would be impossible to retrieve two different
keys to store into a same table. For example, if a client sets a cookie
and a server another one, both need to be updated in the table in the
proper order. This is especially true when multiple keys may be tracked
on each side into the same table (eg: list of IP addresses in a header).

So the correct fix which also maintains the current behaviour consists in
simply removing this flag and never try to optimize for the overwrite case.

This fix also has the benefit of significantly reducing the session size,
by 64 bytes due to alignment issues caused by this flag!

The bug has been there forever (since 1.4-dev7), so a backport to 1.4
would be appropriate.
2013-12-06 23:14:53 +01:00
Willy Tarreau
98aec9ff47 BUG/MINOR: checks: tcp-check actions are enums, not flags
In recent commit 5ecb77f (MEDIUM: checks: add send/expect tcp based check),
bitfields were mistakenly used at some places for the actions. Fortunately,
the only two actions right now are 1 and 2 so they don't share any bit in
common and the bug has no impact.

No backport is needed.
2013-12-06 16:16:41 +01:00
Baptiste Assmann
5ecb77f4c7 MEDIUM: checks: add send/expect tcp based check
This is a generic health check which can be used to match a
banner or send a request and analyse a server response.
It works in a send/expect ways and many exchange can be done between
HAProxy and a server to decide the server status, making HAProxy able to
speak the server's protocol.

It can send arbitrary regular or binary strings and match content as a
regular or binary string or a regex.

Signed-off-by: Baptiste Assmann <bedis9@gmail.com>
2013-12-06 11:50:47 +01:00
Baptiste Assmann
bb77c8e26d MINOR: tools: function my_memmem() to lookup binary contents
This function simply looks for a memory block inside another one.

Signed-off-by: Baptiste Assmann <bedis9@gmail.com>
2013-12-06 11:50:47 +01:00
Willy Tarreau
126d40691a MINOR: tools: add a generic binary hex string parser
We currently use such an hex parser in pat_parse_bin() to parse hex
string patterns. We'll need another generic one so let's move it to
standard.c and have pat_parse_bin() make use of it.
2013-12-06 11:50:47 +01:00
Thierry FOURNIER
0ffe78cfe3 MEDIUM: map: merge identical maps
This patch permits to use the same struct pattern for two indentical maps.
This permits to preserve memory, and permits to update only one
"struct pattern" when the dynamic map update is supported.
2013-12-06 11:40:53 +01:00
Thierry FOURNIER
275db69c07 BUG/MINOR: map: The map list was declared in the map.h file
This bug is harmless and post-dev19, it does not require any backport.
2013-12-06 11:37:28 +01:00
Thierry FOURNIER
d18cd0f110 MEDIUM: http: The redirect strings follows the log format rules.
We handle "http-request redirect" with a log-format string now, but we
leave "redirect" unaffected.

Note that the control of the special "/" case is move from the runtime
execution to the configuration parsing. If the format rule list is
empty, the build_logline() function does nothing.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
d5f624dde7 MEDIUM: sample: add the "map" converter
Add a new converter with the following prototype :

  map(<map_file>[,<default_value>])
  map_<match_type>(<map_file>[,<default_value>])
  map_<match_type>_<output_type>(<map_file>[,<default_value>])

It searches the for input value from <map_file> using the <match_type>
matching method, and return the associated value converted to the type
<output_type>. If the input value cannot be found in the <map_file>,
the converter returns the <default_value>. If the <default_value> is
not set, the converter fails and acts as if no input value could be
fetched. If the <match_type> is not set, it defaults to "str".
Likewise, if the <output_type> is not set, it defaults to "str". For
convenience, the "map" keyword is an alias for "map_str" and maps a
string to another string. The following array contains contains the
list of all the map* converters.

                 +----+----------+---------+-------------+------------+
                 |     `-_   out |         |             |            |
                 | input  `-_    |   str   |     int     |     ip     |
                 | / match   `-_ |         |             |            |
                 +---------------+---------+-------------+------------+
                 | str   / str   | map_str | map_str_int | map_str_ip |
                 | str   / sub   | map_sub | map_sub_int | map_sub_ip |
                 | str   / dir   | map_dir | map_dir_int | map_dir_ip |
                 | str   / dom   | map_dom | map_dom_int | map_dom_ip |
                 | str   / end   | map_end | map_end_int | map_end_ip |
                 | str   / reg   | map_reg | map_reg_int | map_reg_ip |
                 | int   / int   | map_int | map_int_int | map_int_ip |
                 | ip    / ip    | map_ip  | map_ip_int  | map_ip_ip  |
                 +---------------+---------+-------------+------------+

The names are intentionally chosen to reflect the same match methods
as ACLs use.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
4b5e422759 MINOR: map: Define map types
Define the types used with maps, and add new argument type that can
reference the map. This pointer contains the map configuration entries.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
fdbf4842b6 MINOR: sample: add a private field to the struct sample_conv
These flags will be used for maps, and possibly later to pass some
extra information to other converters if needed.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
b805f71d1b MEDIUM: sample: let the cast functions set their output type
This patch allows each sample cast function to specify the sample
output type. The goal is to be able to emit an output type IPv4 or
IPv6 depending on what is found in the input if the next converter
is able to process them both.

The patch also adds a new pseudo type called "ADDR". This type is an
alias for IPV4 and IPV6 which is only used as an input type by converters
who want to express their compatibility with both address formats. It may
not be emitted.

The goal is to unify as much as possible the processing of IPv4 and IPv6
in order not to add extra keywords for the maps which act as converters,
but will match samples like ACLs do with their patterns.
2013-12-02 23:31:33 +01:00
Willy Tarreau
6f8fe310cf MINOR: pattern: import acl_find_match_name() into pattern.h
It's only dedicated to pattern match lookups, so it was renamed
pat_find_match_name().
2013-12-02 23:31:33 +01:00
Willy Tarreau
0cba607400 MINOR: acl/pattern: use types different from int to clarify who does what.
We now have the following enums and all related functions return them and
consume them :

   enum pat_match_res {
	PAT_NOMATCH = 0,         /* sample didn't match any pattern */
	PAT_MATCH = 3,           /* sample matched at least one pattern */
   };

   enum acl_test_res {
	ACL_TEST_FAIL = 0,           /* test failed */
	ACL_TEST_MISS = 1,           /* test may pass with more info */
	ACL_TEST_PASS = 3,           /* test passed */
   };

   enum acl_cond_pol {
	ACL_COND_NONE,		/* no polarity set yet */
	ACL_COND_IF,		/* positive condition (after 'if') */
	ACL_COND_UNLESS,	/* negative condition (after 'unless') */
   };

It's just in order to avoid doubts when reading some code.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
a65b343eee MEDIUM: pattern: rename "acl" prefix to "pat"
This patch just renames functions, types and enums. No code was changed.
A significant number of files were touched, especially the ACL arrays,
so it is likely that some external patches will not apply anymore.

One important thing is that we had to split ACL_PAT_* into two groups :
  - ACL_TEST_{PASS|MISS|FAIL}
  - PAT_{MATCH|UNMATCH}

A future patch will enforce enums on all these places to avoid confusion.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
d163e1ce30 MEDIUM: pattern: create pattern expression
This new structure contains the data needed for pattern matching. It's
the first step to the complete independance of the pattern matching.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
ed66c297c2 REORG: acl/pattern: extract pattern matching from the acl file and create pattern.c
This patch just moves code without any change.

The ACL are just the association between sample and pattern. The pattern
contains the match method and the parse method. These two things are
different. This patch cleans the code by splitting it.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
dd69a04666 MEDIUM: acl: associate "struct sample_storage" to each "struct acl_pattern"
This will be used later with maps. Each map will associate an entry with
a sample_storage value.

This patch changes the "parse" prototype and all the parsing methods.
The goal is to associate "struct sample_storage" to each entry of
"struct acl_pattern". Only the "parse" function can add the sample value
into the "struct acl_pattern".
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
8ed9697064 MINOR: sample: Define new struct sample_storage
This struct is used to store a sample constant. The size of this
struct is less than the struct sample. This struct only contains
a constant and doesn't need the "ctx" nor the "flags".
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
29d47b87c4 MINOR: acl: Extract the pattern matching function
The map feature will need to match acl patterns. This patch extracts
the matching function from the global ACL function "acl_exec_cond".

The code was only moved to its own function, no functional changes were made.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
3a103c5a6b MINOR: acl: Extract the pattern parsing and indexation from the "acl_read_patterns_from_file()" function
With this split, the pattern indexation can apply to any source. The map
feature needs this functionality because the map cannot be loaded with the
same file format as the ones supported by acl_read_patterns_from_file().

The code was only moved to its own function, no functional changes were made.
2013-12-02 23:31:33 +01:00
Thierry FOURNIER
319e495a96 MINOR: acl: export acl arrays
The map feature needs to use the acl parser and converters.
2013-12-02 23:31:32 +01:00
Thierry FOURNIER
d559dd8390 MINOR: tools: Add a function to convert buffer to an ipv6 address
The inet_pton function needs an input string with a final \0. This
function copies the input string to a temporary buffer, adds the final
\0 and converts to address.
2013-12-02 23:31:32 +01:00
Thierry FOURNIER
9c1d67ecbd MINOR: sample: provide the original sample_conv descriptor struct to the argument checker function.
Note that this argument checker is still unused but will be used by
maps.
2013-12-02 23:31:32 +01:00
Thierry FOURNIER
348971ea28 MEDIUM: acl: use the fetch syntax 'fetch(args),conv(),conv()' into the ACL keyword
If the acl keyword is a "fetch", the dedicated parsing function
"sample_parse_expr()" is used. Otherwise, the acl parsing function
"parse_acl_expr()" is extended to understand the syntax of a series
of converters placed after the "fetch" keyword.

Before this patch, each acl uses a "struct sample_fetch" and executes
it with the "<fetch>->process()" function. Now, the dedicated function
"sample_process()" is called.

These syntax are now avalaible:

   acl bad req.hdr(host),lower -m str www
   http-request redirect prefix /go-away if bad

   acl bad hdr_beg(host),lower www
   http-request redirect prefix /go-away if bad
2013-12-02 23:31:32 +01:00
Thierry FOURNIER
8af6ff12b5 MINOR: sample: export sample_casts
just export the sample cast matrix "sample_casts" to prepare the
generic sample conversion parser.
2013-12-02 23:31:32 +01:00
Thierry FOURNIER
20f4996738 MINOR: sample: export the generic sample conversion parser
just export function "find_sample_conv()" to prepare the
generic sample conversion parser.
2013-12-02 23:31:32 +01:00
Willy Tarreau
34c2fb6f89 BUG/MINOR: config: report the correct track-sc number in tcp-rules
When parsing track-sc* actions in tcp-request rules, we now automatically
compute the track-sc identifier number using %d when displaying an error
message. But the ID has become wrong since we introduced sc0, we continue
to report id+1 in error messages causing some confusion.

No backport is needed.
2013-12-02 23:31:32 +01:00
Willy Tarreau
830bf61815 BUG/MINOR: connection: fix typo in error message report
"unknownn" -> "unknown"
2013-12-01 20:29:58 +01:00
Thierry FOURNIER
1c0054fe83 BUG/MINOR: arg: fix error reporting for add-header/set-header sample fetch arguments
The 'add-header %[samples]' parsing errors associated to http-request
and http-response are displayed with the wrong keyword.

Configuration entry:

   http-request set-header mon-header %[res.hdr(user-agent)]

Original error message:

   [WARNING] 323/150920 (16559) : parsing [haproxy.conf:36] : 'log-format' : sample fetch <res.hdr ...

After commit error message:

   [WARNING] 323/150929 (16580) : parsing [haproxy.conf:36] : 'http-request' : sample fetch <res.hdr ...
2013-11-28 18:25:18 +01:00
Simon Horman
8c3d0be987 MEDIUM: Add DRAIN state and report it on the stats page
Add a DRAIN sub-state for a server which
will be shown on the stats page instead of UP if
its effective weight is zero.

Also, log if a server enters or leaves the DRAIN state
as the result of an agent check.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-25 07:31:16 +01:00
Simon Horman
671b6f02b5 MEDIUM: Add enable and disable agent unix socket commands
The syntax of this new commands are:

enable agent <backend>/<server>
disable agent <backend>/<server>

These commands allow temporarily stopping and subsequently
re-starting an auxiliary agent check. The effect of this is as follows:

New checks are only initialised when the agent is in the enabled. Thus,
disable agent will prevent any new agent checks from begin initiated until
the agent re-enabled using enable agent.

When an agent is disabled the processing of an auxiliary agent check that
was initiated while the agent was set as enabled is as follows: All
results that would alter the weight, specifically "drain" or a weight
returned by the agent, are ignored. The processing of agent check is
otherwise unchanged.

The motivation for this feature is to allow the weight changing effects
of the agent checks to be paused to allow the weight of a server to be
configured using set weight without being overridden by the agent.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-25 07:31:16 +01:00
Simon Horman
58c32978b2 MEDIUM: Set rise and fall of agent checks to 1
This is achieved by moving rise and fall from struct server to struct check.

After this move the behaviour of the primary check, server->check is
unchanged. However, the secondary agent check, server->agent now has
independent rise and fall values each of which are set to 1.

The result is that receiving "fail", "stopped" or "down" just once from the
agent will mark the server as down. And receiving a weight just once will
allow the server to be marked up if its primary check is in good health.

This opens up the scope to allow the rise and fall values of the agent
check to be configurable, however this has not been implemented at this
stage.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-25 07:31:16 +01:00
Simon Horman
d60d69138b MEDIUM: checks: Add supplementary agent checks
Allow an auxiliary agent check to be run independently of the
regular a regular health check. This is enabled by the agent-check
server setting.

The agent-port, which specifies the TCP port to use for the agent's
connections, is required.

The agent-inter, which specifies the interval between agent checks and
timeout of agent checks, is optional. If not set the value for regular
checks is used.

e.g.
server	web1_1 127.0.0.1:80 check agent-port 10000

If either the health or agent check determines that a server is down
then it is marked as being down, otherwise it is marked as being up.

An agent health check performed by opening a TCP socket and reading an
ASCII string. The string should have one of the following forms:

* An ASCII representation of an positive integer percentage.
  e.g. "75%"

  Values in this format will set the weight proportional to the initial
  weight of a server as configured when haproxy starts.

* The string "drain".

  This will cause the weight of a server to be set to 0, and thus it
  will not accept any new connections other than those that are
  accepted via persistence.

* The string "down", optionally followed by a description string.

  Mark the server as down and log the description string as the reason.

* The string "stopped", optionally followed by a description string.

  This currently has the same behaviour as "down".

* The string "fail", optionally followed by a description string.

  This currently has the same behaviour as "down".

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-25 07:31:16 +01:00
Willy Tarreau
d32c399747 MINOR: stats: report correct throttling percentage for servers in slowstart
The column used to report the throttle percentage when a server is in
slowstart is based on the time only. This is wrong, because server weights
in slowstart are updated at most once a second, so the reported value is
wrong at least fo rone second during each step, which means all the time
when using short delays (< 20s).

The second point is that it's disturbing to see a weight < 100% without
any throttle at the end of the period (during the last second), because
the effective weight has not yet been updated.

Instead, we now compute the exact ratio between eweight and uweight and
report it. It's always accurate and describes the value being used instead
of using only the date.

It can be backported to 1.4 though it's not particularly important.
2013-11-21 15:30:45 +01:00
Willy Tarreau
004e045f31 BUG/MAJOR: server: weight calculation fails for map-based algorithms
A crash was reported by Igor at owind when changing a server's weight
on the CLI. Lukas Tribus could reproduce a related bug where setting
a server's weight would result in the new weight being multiplied by
the initial one. The two bugs are the same.

The incorrect weight calculation results in the total farm weight being
larger than what was initially allocated, causing the map index to be out
of bounds on some hashes. It's easy to reproduce using "balance url_param"
with a variable param, or with "balance static-rr".

It appears that the calculation is made at many places and is not always
right and not always wrong the same way. Thus, this patch introduces a
new function "server_recalc_eweight()" which is dedicated to this task
of computing ->eweight from many other elements including uweight and
current time (for slowstart), and all users now switch to use this
function.

The patch is a bit large but the code was not trivially fixable in a way
that could guarantee this situation would not occur anymore. The fix is
much more readable and has been verified to work with all algorithms,
with both consistent and map-based hashes, and even with static-rr.

Slowstart was tested as well, just like enable/disable server.

The same bug is very likely present in 1.4 as well, so the patch will
probably need to be backported eventhough it will not apply as-is.

Thanks to Lukas and Igor for the information they provided to reproduce it.
2013-11-21 15:09:02 +01:00
Simon Horman
125d099662 MEDIUM: Move health element to struct check
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-19 09:36:07 +01:00
Simon Horman
cd5d7b678e MEDIUM: Add state to struct check
Add state to struct check. This is currently used to store one bit,
CHK_RUNNING, which is set if a check is running and clear otherwise.
This bit was previously SRV_CHK_RUNNING of the state element of struct
server.

This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
2013-11-19 09:36:04 +01:00
Simon Horman
4a741432be MEDIUM: Paramatise functions over the check of a server
Paramatise the following functions over the check of a server

* set_server_down
* set_server_up
* srv_getinter
* server_status_printf
* set_server_check_status
* set_server_disabled
* set_server_enabled

Generally the server parameter of these functions has been removed.
Where it is still needed it is obtained using check->server.

This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.
By paramatising these functions they may act on each of the checks
without further significant modification.

Explanation of the SSP_O_HCHK portion of this change:

* Prior to this patch SSP_O_HCHK serves a single purpose which
  is to tell server_status_printf() weather it should print
  the details of the check of a server or not.

  With the paramatisation that this patch adds there are two cases.
  1) Printing the details of the check in which case a
     valid check parameter is needed.
  2) Not printing the details of the check in which case
     the contents check parameter are unused.

  In case 1) we could pass SSP_O_HCHK and a valid check and;
  In case 2) we could pass !SSP_O_HCHK and any value for check
  including NULL.

  If NULL is used for case 2) then SSP_O_HCHK becomes supurfulous
  and as NULL is used for case 2) SSP_O_HCHK has been removed.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-19 09:35:54 +01:00
Simon Horman
28b5ffc76f MEDIUM: Move result element to struct check
Move result element from struct server to struct check
This allows check results to be independent of the check's server.

This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-19 09:35:52 +01:00
Simon Horman
6618300e13 MEDIUM: Split up struct server's check element
This is in preparation for associating a agent check
with a server which runs as well as the server's existing check.

The split has been made by:
* Moving elements of struct server's check element that will
  be shared by both checks into a new check_common element
  of struct server.
* Moving the remaining elements to a new struct check and
  making struct server's check element a struct check.
* Adding a server element to struct check, a back-pointer
  to the server element it is a member of.
  - At this time the server could be obtained using
    container_of, however, this will not be so easy
    once a second struct check element is added to struct server
    to accommodate an agent health check.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-19 09:35:48 +01:00
Simon Horman
c69d547638 CLEANUP: Remove unused 'last_slowstart_change' field from struct peer
This was inadvertently added by "MEDIUM: checks: Add agent health check".
It appears to have never been used.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-19 08:04:59 +01:00
Simon Horman
a360844735 CLEANUP: Make parameters of srv_downtime and srv_getinter const
The parameters of srv_downtime and srv_getinter are not modified
and thus may be const.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-11-19 08:04:58 +01:00
Willy Tarreau
a0f4271497 MEDIUM: backend: add support for the wt6 hash
This function was designed for haproxy while testing other functions
in the past. Initially it was not planned to be used given the not
very interesting numbers it showed on real URL data : it is not as
smooth as the other ones. But later tests showed that the other ones
are extremely sensible to the server count and the type of input data,
especially DJB2 which must not be used on numeric input. So in fact
this function is still a generally average performer and it can make
sense to merge it in the end, as it can provide an alternative to
sdbm+avalanche or djb2+avalanche for consistent hashing or when hashing
on numeric data such as a source IP address or a visitor identifier in
a URL parameter.
2013-11-14 16:37:50 +01:00
Bhaskar Maddala
b6c0ac94a4 MEDIUM: backend: Implement avalanche as a modifier of the hashing functions.
Summary:
Avalanche is supported not as a native hashing choice, but a modifier
on the hashing function. Note that this means that possible configs
written after 1.5-dev4 using "hash-type avalanche" will get an informative
error instead. But as discussed on the mailing list it seems nobody ever
used it anyway, so let's fix it before the final 1.5 release.

The default values were selected for backward compatibility with previous
releases, as discussed on the mailing list, which means that the consistent
hashing will still apply the avalanche hash by default when no explicit
algorithm is specified.

Examples
  (default) hash-type map-based
	Map based hashing using sdbm without avalanche

  (default) hash-type consistent
	Consistent hashing using sdbm with avalanche

Additional Examples:

  (a) hash-type map-based sdbm
	Same as default for map-based above
  (b) hash-type map-based sdbm avalanche
	Map based hashing using sdbm with avalanche
  (c) hash-type map-based djb2
	Map based hashing using djb2 without avalanche
  (d) hash-type map-based djb2 avalanche
	Map based hashing using djb2 with avalanche
  (e) hash-type consistent sdbm avalanche
	Same as default for consistent above
  (f) hash-type consistent sdbm
	Consistent hashing using sdbm without avalanche
  (g) hash-type consistent djb2
	Consistent hashing using djb2 without avalanche
  (h) hash-type consistent djb2 avalanche
	Consistent hashing using djb2 with avalanche
2013-11-14 16:37:50 +01:00
Bhaskar
98634f0c7b MEDIUM: backend: Enhance hash-type directive with an algorithm options
Summary:
In testing at tumblr, we found that using djb2 hashing instead of the
default sdbm hashing resulted is better workload distribution to our backends.

This commit implements a change, that allows the user to specify the hash
function they want to use. It does not limit itself to consistent hashing
scenarios.

The supported hash functions are sdbm (default), and djb2.

For a discussion of the feature and analysis, see mailing list thread
"Consistent hashing alternative to sdbm" :

      http://marc.info/?l=haproxy&m=138213693909219

Note: This change does NOT make changes to new features, for instance,
applying an avalance hashing always being performed before applying
consistent hashing.
2013-11-14 16:37:50 +01:00
Thierry FOURNIER
a054d410db BUILD/MINOR: missing header file
In the header file "types/proto_http.h", the list are used
but the header file "mini-clist.h" is not included.
2013-10-23 15:53:56 +02:00
Thierry FOURNIER
de6617b486 MINOR: http: some exported functions were not in the header file
Export the following functions:
 - find_hdr_value_end
 - http_header_match2
 - http_remove_header2
 - http_header_add_tail2
2013-10-23 12:21:38 +02:00
Thierry FOURNIER
ef37a66628 CLEANUP: The function "regex_exec" needs the string length but in many case they expect null terminated char.
If haproxy is compiled with the USE_PCRE_JIT option, the length of the
string is used. If it is compiled without this option the function doesn't
use the length and expects a null terminated string.

The prototype of the function is ambiguous, and depends on the
compilation option. The developer can think that the length is always
used, and many bugs can be created.

This patch makes sure that the length is used. The regex_exec function
adds the final '\0' if it is needed.
2013-10-23 12:19:51 +02:00
Thierry FOURNIER
ed5a4aefae CLEANUP: regex: Create regex_comp function that compiles regex using compilation options
The current file "regex.h" define an abstraction for the regex. It
provides the same struct name and the same "regexec" function for the
3 regex types supported: standard libc, basic pcre and jit pcre.

The regex compilation function is not provided by this file. If the
developper wants to use regex, he must write regex compilation code
containing "#define *JIT*".

This patch provides a unique regex compilation function according to
the compilation options.

In addition, the "regex.h" file checks the presence of the "#define
PCRE_CONFIG_JIT" when "USE_PCRE_JIT" is enabled. If this flag is not
present, the pcre lib doesn't support JIT and "#error" is emitted.
2013-10-14 14:42:50 +02:00
Thierry FOURNIER
e28f1ecf2b BUILD/MINOR: missing header file
In the header file "common/regex.h", the C keyword NULL is used. This
keyword is referenced into the header file "stdlib.h", but this is not
included.
2013-10-10 11:38:35 +02:00
Godbach
2b8fd54287 DOC: fix typo in comments
Hi Willy,

There is a patch to fix typo in comments, please check the attachment
for you information.

The commit log is as below:

commit 9824d1b3740ac2746894f1aa611c795366c84210
Author: Godbach <nylzhaowei@gmail.com>
Date:   Mon Sep 30 11:05:42 2013 +0800

    DOC: fix typo in comments

      0x20000000 -> 0x40000000
      vuf -> buf
      ethod -> Method

    Signed-off-by: Godbach <nylzhaowei@gmail.com>

--
Best Regards,
Godbach

From 9824d1b3740ac2746894f1aa611c795366c84210 Mon Sep 17 00:00:00 2001
From: Godbach <nylzhaowei@gmail.com>
Date: Mon, 30 Sep 2013 11:05:42 +0800
Subject: [PATCH] DOC: fix typo in comments

  0x20000000 -> 0x40000000
  vuf -> buf
  ethod -> Method

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2013-10-01 09:49:21 +02:00
Willy Tarreau
cc1e04b1e8 MINOR: tcp: add new "close" action for tcp-response
This new action immediately closes the connection with the server
when the condition is met. The first such rule executed ends the
rules evaluation. The main purpose of this action is to force a
connection to be finished between a client and a server after an
exchange when the application protocol expects some long time outs
to elapse first. The goal is to eliminate idle connections which
take signifiant resources on servers with certain protocols.
2013-09-11 23:28:51 +02:00
Willy Tarreau
3a925c155d MEDIUM: stick-tables: flush old entries upon soft-stop
When a process with large stick tables is replaced by a new one and remains
present until the last connection finishes, it keeps these data in memory
for nothing since they will never be used anymore by incoming connections,
except during syncing with the new process. This is especially problematic
when dealing with long session protocols such as WebSocket as it becomes
possible to stack many processes and eat a lot of memory.

So the idea here is to know if a table still needs to be synced or not,
and to purge all unused entries once the sync is complete. This means that
after a few hundred milliseconds when everything has been synchronized with
the new process, only a few entries will remain allocated (only the ones
held by sessions during the restart) and all the remaining memory will be
freed.

Note that we carefully do that only after the grace period is expired so as
not to impact a possible proxy that needs to accept a few more connections
before leaving.

Doing this required to add a sync counter to the stick tables, to know how
many peer sync sessions are still in progress in order not to flush the entries
until all synchronizations are completed.
2013-09-04 17:54:01 +02:00
Evan Broder
be55431f9f MINOR: ssl: Add statement 'verifyhost' to "server" statements
verifyhost allows you to specify a hostname that the remote server's
SSL certificate must match. Connections that don't match will be
closed with an SSL error.
2013-09-01 07:55:49 +02:00
Willy Tarreau
9f09521f2d BUG/MEDIUM: unique_id: HTTP request counter must be unique!
The HTTP request counter is incremented non atomically, which means that
many requests can log the same ID. Let's increment it when it is consumed
so that we avoid this case.

This bug was reported by Patrick Hemmer. It's 1.5-specific and does not
need to be backported.
2013-08-13 17:52:20 +02:00
Willy Tarreau
47060b6ae0 MINOR: cli: make it possible to enter multiple values at once with "set table"
The "set table" statement allows to create new entries with their respective
values. Till now it was limited to a single data type per line, requiring as
many "set table" statements as the desired data types to be set. Since this
is only a parser limitation, this patch gets rid of it. It also allows the
creation of a key with no data types (all reset to their default values).
2013-08-01 21:17:19 +02:00
Willy Tarreau
b4c8493a9f MINOR: session: make the number of stick counter entries more configurable
In preparation of more flexibility in the stick counters, make their
number configurable. It still defaults to 3 which is the minimum
accepted value. Changing the value alone is not sufficient to get
more counters, some bitfields still need to be updated and the TCP
actions need to be updated as well, but this update tries to be
easier, which is nice for experimentation purposes.
2013-08-01 21:17:14 +02:00
Willy Tarreau
cadd8c9ec3 MINOR: payload: split smp_fetch_rdp_cookie()
This function is also called directly from backend.c, so let's stop
building fake args to call it as a sample fetch, and have a lower
layer more generic function instead.
2013-08-01 21:17:13 +02:00
Willy Tarreau
ef38c39287 MEDIUM: sample: systematically pass the keyword pointer to the keyword
We're having a lot of duplicate code just because of minor variants between
fetch functions that could be dealt with if the functions had the pointer to
the original keyword, so let's pass it as the last argument. An earlier
version used to pass a pointer to the sample_fetch element, but this is not
the best solution for two reasons :
  - fetch functions will solely rely on the keyword string
  - some other smp_fetch_* users do not have the pointer to the original
    keyword and were forced to pass NULL.

So finally we're passing a pointer to the keyword as a const char *, which
perfectly fits the original purpose.
2013-08-01 21:17:13 +02:00
Godbach
a34bdc0ea4 BUG/MEDIUM: server: set the macro for server's max weight SRV_UWGHT_MAX to SRV_UWGHT_RANGE
The max weight of server is 256 now, but SRV_UWGHT_MAX is still 255. As a result,
FWRR will not work well when server's weight is 256. The description is as below:

There are some macros related to server's weight in include/types/server.h:
    #define SRV_UWGHT_RANGE 256
    #define SRV_UWGHT_MAX   (SRV_UWGHT_RANGE - 1)
    #define SRV_EWGHT_MAX   (SRV_UWGHT_MAX   * BE_WEIGHT_SCALE)

Since weight of server can be reach to 256 and BE_WEIGHT_SCALE equals to 16,
the max eweight of server should be 256*16 = 4096, it will exceed SRV_EWGHT_MAX
which equals to SRV_UWGHT_MAX*BE_WEIGHT_SCALE = 255*16 = 4080. When a server
with weight 256 is insterted into FWRR tree during initialization, the key value
of this server should be SRV_EWGHT_MAX - s->eweight = 4080 - 4096 = -16 which
is closed to UINT_MAX in unsigned type, so the server with highest weight will
be not elected as the first server to process request.

In addition, it is a better choice to compare with SRV_UWGHT_MAX than a magic
number 256 while doing check for the weight. The max number of servers for
round-robin algorithm is also updated.

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2013-07-22 09:29:34 +02:00
Lukas Tribus
67db8df12b MEDIUM: http: add IPv6 support for "set-tos"
As per RFC3260 #4 and BCP37 #4.2 and #5.2, the IPv6 counterpart of TOS
is "traffic class".

Add support for IPv6 traffic class in "set-tos" by moving the "set-tos"
related code to the new inline function inet_set_tos(), handling IPv4
(IP_TOS), IPv6 (IPV6_TCLASS) and IPv4-mapped sockets (IP_TOS, like
::ffff:127.0.0.1).

Also define - if missing - the IN6_IS_ADDR_V4MAPPED() macro in
include/common/compat.h for compatibility.
2013-06-23 18:01:38 +02:00
Willy Tarreau
dc13c11c1e BUG/MEDIUM: prevent gcc from moving empty keywords lists into BSS
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.

The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.

The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.

One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.

A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
2013-06-21 23:29:02 +02:00
Godbach
430f291a99 CLEANUP: session: remove event_accept() which was not used anymore
Remove event_accept() in include/proto/proto_http.h and use correct function
name in other two files instead of event_accept().

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2013-06-20 08:07:35 +02:00
Willy Tarreau
be4a3eff34 MEDIUM: counters: use sc0/sc1/sc2 instead of sc1/sc2/sc3
It was a bit inconsistent to have gpc start at 0 and sc start at 1,
so make sc start at zero like gpc. No previous release was issued
with sc3 anyway, so no existing setup should be affected.
2013-06-17 15:04:07 +02:00
Thierry FOURNIER
7eeb435494 CLEANUP: protect checks.h from multiple inclusions
The types/checks.h include file isn't protected against multiple inclusions,
so let's surround it with "#ifndef _TYPES_CHECKS_H/#endif" to fix this.
2013-06-14 19:53:02 +02:00
Thierry FOURNIER
d3879e8b57 CLEANUP: fix missing include <string.h> in proto/listener.h
The file proto/listener.h makes use of strdup() but doesn't include
<string.h> so it's sensible to include file ordering.
2013-06-14 19:52:17 +02:00
Willy Tarreau
4f0d919bd4 MEDIUM: tcp: add "tcp-request connection expect-proxy layer4"
This configures the client-facing connection to receive a PROXY protocol
header before any byte is read from the socket. This is equivalent to
having the "accept-proxy" keyword on the "bind" line, except that using
the TCP rule allows the PROXY protocol to be accepted only for certain
IP address ranges using an ACL. This is convenient when multiple layers
of load balancers are passed through by traffic coming from public
hosts.
2013-06-11 20:40:55 +02:00
Willy Tarreau
51347ed94c MEDIUM: http: add the "set-mark" action on http-request/http-response rules
"set-mark" is used to set the Netfilter MARK on all packets sent to the
client to the value passed in <mark> on platforms which support it. This
value is an unsigned 32 bit value which can be matched by netfilter and
by the routing table. It can be expressed both in decimal or hexadecimal
format (prefixed by "0x"). This can be useful to force certain packets to
take a different route (for example a cheaper network path for bulk
downloads). This works on Linux kernels 2.6.32 and above and requires
admin privileges.
2013-06-11 19:34:13 +02:00
Willy Tarreau
42cf39e3b9 MEDIUM: http: add support for "set-tos" in http-request/http-response
This manipulates the TOS field of the IP header of outgoing packets sent
to the client. This can be used to set a specific DSCP traffic class based
on some request or response information. See RFC2474, 2597, 3260 and 4594
for more information.
2013-06-11 19:04:37 +02:00
Willy Tarreau
9a355ec257 MEDIUM: http: add support for action "set-log-level" in http-request/http-response
Some users want to disable logging for certain non-important requests such as
stats requests or health-checks coming from another equipment. Other users want
to log with a higher importance (eg: notice) some special traffic (POST requests,
authenticated requests, requests coming from suspicious IPs) or some abnormally
large responses.

This patch responds to all these needs at once by adding a "set-log-level" action
to http-request/http-response. The 8 syslog levels are supported, as well as "silent"
to disable logging.
2013-06-11 17:50:26 +02:00
Willy Tarreau
abcd5145f8 MEDIUM: log: add a log level override value in struct session
This log level will be used in a further patch to change the log level
depending on the request or response.
2013-06-11 17:50:26 +02:00
Willy Tarreau
f4c43c13be MEDIUM: http: add the "set-nice" action to http-request and http-response
This new action changes the nice factor of the task processing the current
request.
2013-06-11 17:50:26 +02:00
Willy Tarreau
e365c0b92b MEDIUM: http: add a new "http-response" ruleset
Some actions were clearly missing to process response headers. This
patch adds a new "http-response" ruleset which provides the following
actions :
  - allow : stop evaluating http-response rules
  - deny : stop and reject the response with a 502
  - add-header : add a header in log-format mode
  - set-header : set a header in log-format mode
2013-06-11 16:06:12 +02:00
Willy Tarreau
2b57cb8f30 MEDIUM: protocol: implement a "drain" function in protocol layers
Since commit cfd97c6f was merged into 1.5-dev14 (BUG/MEDIUM: checks:
prevent TIME_WAITs from appearing also on timeouts), some valid health
checks sometimes used to show some TCP resets. For example, this HTTP
health check sent to a local server :

  19:55:15.742818 IP 127.0.0.1.16568 > 127.0.0.1.8000: S 3355859679:3355859679(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742841 IP 127.0.0.1.8000 > 127.0.0.1.16568: S 1060952566:1060952566(0) ack 3355859680 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:15.742863 IP 127.0.0.1.16568 > 127.0.0.1.8000: . ack 1 win 257
  19:55:15.745402 IP 127.0.0.1.16568 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:15.745488 IP 127.0.0.1.8000 > 127.0.0.1.16568: FP 1:146(145) ack 23 win 257
  19:55:15.747109 IP 127.0.0.1.16568 > 127.0.0.1.8000: R 23:23(0) ack 147 win 257

After some discussion with Chris Huang-Leaver, it appeared clear that
what we want is to only send the RST when we have no other choice, which
means when the server has not closed. So we still keep SYN/SYN-ACK/RST
for pure TCP checks, but don't want to see an RST emitted as above when
the server has already sent the FIN.

The solution against this consists in implementing a "drain" function at
the protocol layer, which, when defined, causes as much as possible of
the input socket buffer to be flushed to make recv() return zero so that
we know that the server's FIN was received and ACKed. On Linux, we can make
use of MSG_TRUNC on TCP sockets, which has the benefit of draining everything
at once without even copying data. On other platforms, we read up to one
buffer of data before the close. If recv() manages to get the final zero,
we don't disable lingering. Same for hard errors. Otherwise we do.

In practice, on HTTP health checks we generally find that the close was
pending and is returned upon first recv() call. The network trace becomes
cleaner :

  19:55:23.650621 IP 127.0.0.1.16561 > 127.0.0.1.8000: S 3982804816:3982804816(0) win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650644 IP 127.0.0.1.8000 > 127.0.0.1.16561: S 4082139313:4082139313(0) ack 3982804817 win 32792 <mss 16396,nop,nop,sackOK,nop,wscale 7>
  19:55:23.650666 IP 127.0.0.1.16561 > 127.0.0.1.8000: . ack 1 win 257
  19:55:23.651615 IP 127.0.0.1.16561 > 127.0.0.1.8000: P 1:23(22) ack 1 win 257
  19:55:23.651696 IP 127.0.0.1.8000 > 127.0.0.1.16561: FP 1:146(145) ack 23 win 257
  19:55:23.652628 IP 127.0.0.1.16561 > 127.0.0.1.8000: F 23:23(0) ack 147 win 257
  19:55:23.652655 IP 127.0.0.1.8000 > 127.0.0.1.16561: . ack 24 win 257

This change should be backported to 1.4 which is where Chris encountered
this issue. The code is different, so probably the tcp_drain() function
will have to be put in the checks only.
2013-06-10 20:33:23 +02:00
Willy Tarreau
04ff9f105f MINOR: http: add full-length header fetch methods
The req.hdr and res.hdr fetch methods do not work well on headers which
are allowed to contain commas, such as User-Agent, Date or Expires.
More specifically, full-length matching is impossible if a comma is
present.

This patch introduces 4 new fetch functions which are designed to work
with these full-length headers :
  - req.fhdr, req.fhdr_cnt
  - res.fhdr, res.fhdr_cnt

These ones do not stop at commas and permit to return full-length header
values.
2013-06-10 18:39:42 +02:00
Willy Tarreau
570f221cbb MINOR: log: add a new flag 'L' for locally processed requests
People who use "option dontlog-normal" are bothered with redirects and
stats being logged and reported as errors in the logs ("PR" = proxy
blocked the request).

This patch introduces a new flag 'L' for when a request is locally
processed, that is not considered as an error by the log filters. That
way we know a request was intercepted and processed by haproxy without
logging the line when "option dontlog-normal" is in effect.
2013-06-10 16:42:09 +02:00
Willy Tarreau
bf43f6eb32 MINOR: defaults: allow REQURI_LEN and CAPTURE_LEN to be redefined
Some users want to change these default settings but it was not
convenient.
2013-06-10 10:30:09 +02:00
Willy Tarreau
379357af58 BUG/MAJOR: http: always ensure response buffer has some room for a response
Since 1.5-dev12 and commit 3bf1b2b8 (MAJOR: channel: stop relying on
BF_FULL to take action), the HTTP parser switched to channel_full()
instead of BF_FULL to decide whether a buffer had enough room to start
parsing a request or response. The problem is that channel_full()
intentionally ignores outgoing data, so a corner case exists where a
large response might still be left in a response buffer with just a
few bytes left (much less than the reserve), enough to accept a second
response past the last data, but not enough to permit the HTTP processor
to add some headers. Since all the processing relies on this space being
available, we can get some random crashes when clients pipeline requests.

The analysis of a core from haproxy configured with 20480 bytes buffers
shows this : with enough "luck", when sending back the response for the
first request, the client is slow, the TCP window is congested, the socket
buffers are full, and haproxy's buffer fills up. We still have 20230 bytes
of response data in a 20480 response buffer. The second request is sent to
the server which returns 214 bytes which fit in the small 250 bytes left
in this buffer. And the buffer arrangement makes it possible to escape all
the controls in http_wait_for_response() :

    |<------ response buffer = 20480 bytes ------>|
    [ 2/2  | 3 | 4 |          1/2                 ]
           ^ start of circular buffer

      1/2 = beginning of previous response (18240)
      2/2 = end of previous response       (1990)
        3 = current response               (214)
        4 = free space                     (36)

  - channel_full() returns false (20230 bytes are going to leave)
  - the response headers does not wrap at the end of the buffer
  - the remaining linear room after the headers is larger than the
    reserve, because it's the previous response which wraps :
  => response is processed

Header rewriting causes it to reach 260 bytes, 10 bytes larger than what
the buffer could hold. So all computations during header addition are
wrong and lead to the corruption we've observed.

All the conditions are very hard to meet (which explains why it took
almost one year for this bug to show up) and are almost impossible to
reproduce on purpose on a test platform. But the bug is clearly there.

This issue was reported by Dinko Korunic who kindly devoted a lot of
time to provide countless traces and cores, and to experiment with
troubleshooting patches to knock the bug down. Thanks Dinko!

No backport is needed, but all 1.5-dev versions between dev12 and dev18
included must be upgraded. A workaround consists in setting option
forceclose to prevent pipelined requests from being processed.
2013-06-08 13:14:17 +02:00
Willy Tarreau
ba2ffd18b5 MEDIUM: counters: add a new "gpc0_rate" counter in stick-tables
This counter is special in that instead of reporting the gpc0 cumulative
count, it returns its increase rate over the configured period.
2013-05-29 15:54:14 +02:00
Willy Tarreau
e25c917af8 MEDIUM: counters: add support for tracking a third counter
We're often missin a third counter to track base, src and base+src at
the same time. Here we introduce track_sc3 to have this third counter.
It would be wise not to add much more counters because that slightly
increases the session size and processing time though the real issue
is more the declaration of the keywords in the code and in the doc.
2013-05-29 00:37:16 +02:00
Willy Tarreau
d5ca9abb0d MINOR: counters: make it easier to extend the amount of tracked counters
By properly affecting the flags and values, it becomes easier to add
more tracked counters, for example for experimentation. It also slightly
reduces the code and the number of tests. No counters were added with
this patch.
2013-05-28 17:43:40 +02:00
Pieter Baauw
1eb7592bba MINOR: tproxy: add support for OpenBSD
OpenBSD uses (SOL_SOCKET, SO_BINDANY) to enable transparent
proxy on a socket.

This patch adds support for the relevant setsockopt() calls.
2013-05-11 08:03:50 +02:00
Pieter Baauw
ff30b6667b MINOR: tproxy: add support for FreeBSD
FreeBSD uses (IPPROTO_IP, IP_BINDANY) and (IPPROTO_IPV6, IPV6_BINDANY)
to enable transparent proxy on a socket.

This patch adds support for the relevant setsockopt() calls.
2013-05-11 08:03:43 +02:00
Pieter Baauw
d551fb5a8d REORG: tproxy: prepare the transparent proxy defines for accepting other OSes
This patch does not change the logic of the code, it only changes the
way OS-specific defines are tested.

At the moment the transparent proxy code heavily depends on Linux-specific
defines. This first patch introduces a new define "CONFIG_HAP_TRANSPARENT"
which is set every time the defines used by transparent proxy are present.
This also means that with an up-to-date libc, it should not be necessary
anymore to force CONFIG_HAP_LINUX_TPROXY during the build, as the flags
will automatically be detected.

The CTTPROXY flags still remain separate because this older API doesn't
work the same way.

A new line has been added in the version output for haproxy -vv to indicate
what transparent proxy support is available.
2013-05-11 08:03:37 +02:00
Willy Tarreau
dd11293e84 BUG/MINOR: acl: fix a double free during exit when using PCRE_JIT
When freeing ACL regex, we don't want to perform the free() in regex_free()
as it's already performed in free_pattern(). The double free only happens
when using PCRE_JIT when freeing everything during exit so it's harmless
but exhibits libc errors during a reload/restart.

Bug reported by Seri.
2013-05-08 18:14:18 +02:00
Emmanuel Hocdet
7c41a1b59b MEDIUM: ssl: improve crt-list format to support negation
Improve the crt-list file format to allow a rule to negate a certain SNI :

        <crtfile> [[!]<snifilter> ...]

This can be useful when a domain supports a wildcard but you don't want to
deliver the wildcard cert for certain specific domains.
2013-05-07 22:11:54 +02:00
de Lafond Guillaume
88c278fadf MEDIUM: stats: add proxy name filtering on the statistic page
This patch adds a "scope" box in the statistics page in order to
display only proxies with a name that contains the requested value.
The scope filter is preserved across all clicks on the page.
2013-04-15 22:50:33 +02:00
Willy Tarreau
62a6123fed BUG/MEDIUM: log: fix regression on log-format handling
Commit a4312fa2 merged into dev18 improved log-format management by
processing "log-format" and "unique-id-format" where they were declared,
so that the faulty args could be reported with their correct line numbers.

Unfortunately, the log-format parser considers the proxy mode (TCP/HTTP)
and now if the directive is set before the "mode" statement, it can be
rejected and report warnings.

So we really need to parse these directives at the end of a section at
least. Right now we do not have an "end of section" event, so we need
to store the file name and line number for each of these directives,
and take care of them at the end.

One of the benefits is that now the line numbers can be inherited from
the line passing "option httplog" even if it's in a defaults section.

Future improvements should be performed to report line numbers in every
log-format processed by the parser.
2013-04-12 18:13:46 +02:00
Willy Tarreau
ab861d3856 MINOR: ssl: add support for the "alpn" bind keyword
The ALPN extension is meant to replace the now deprecated NPN extension.
This patch implements support for it. It requires a version of openssl
with support for this extension. Patches are available here right now :

   http://html5labs.interopbridges.com/media/167447/alpn_patches.zip
2013-04-03 02:13:02 +02:00
Willy Tarreau
a4312fa28e MAJOR: sample: maintain a per-proxy list of the fetch args to resolve
While ACL args were resolved after all the config was parsed, it was not the
case with sample fetch args because they're almost everywhere now.

The issue is that ACLs now solely rely on sample fetches, so their args
resolving doesn't work anymore. And many fetches involving a server, a
proxy or a userlist don't work at all.

The real issue is that at the bottom layers we have no information about
proxies, line numbers, even ACLs in order to report understandable errors,
and that at the top layers we have no visibility over the locations where
fetches are referenced (think log node).

After failing multiple unsatisfying solutions attempts, we now have a new
concept of args list. The principle is that every proxy has a list head
which contains a number of indications such as the config keyword, the
context where it's used, the file and line number, etc... and a list of
arguments. This list head is of the same type as the elements, so it
serves as a template for adding new elements. This way, it is filled from
top to bottom by the callers with the information they have (eg: line
numbers, ACL name, ...) and the lower layers just have to duplicate it and
add an element when they face an argument they cannot resolve yet.

Then at the end of the configuration parsing, a loop passes over each
proxy's list and resolves all the args in sequence. And this way there is
all necessary information to report verbose errors.

The first immediate benefit is that for the first time we got very precise
location of issues (arg number in a keyword in its context, ...). Second,
in order to do this we had to parse log-format and unique-id-format a bit
earlier, so that was a great opportunity for doing so when the directives
are encountered (unless it's a default section). This way, the recorded
line numbers for these args are the ones of the place where the log format
is declared, not the end of the file.

Userlists report slightly more information now. They're the only remaining
ones in the ACL resolving function.
2013-04-03 02:13:02 +02:00
Willy Tarreau
93fddf1dbc MEDIUM: acl: have a pointer to the keyword name in acl_expr
The acl_expr struct used to hold a pointer to the ACL keyword. But since
we now have all the relevant pointers, we don't need that anymore, we just
need the pointer to the keyword as a string in order to return warnings
and error messages.

So let's change this in order to remove the dependency on the acl_keyword
struct from acl_expr.

During this change, acl_cond_kw_conflicts() used to return a pointer to an
ACL keyword but had to be changed to return a const char* for the same reason.
2013-04-03 02:13:01 +02:00
Willy Tarreau
acca90d8e5 MINOR: acl: remove the use_count in acl keywords
use_cnt is not used at all anymore, let's get rid of it.
2013-04-03 02:13:01 +02:00
Willy Tarreau
5adeda1f63 MAJOR: acl: add option -m to change the pattern matching method
ACL expressions now support "-m" in addition to "-i" and "-f". This new
option is followed by the name of the pattern matching method to be used
on the extracted pattern. This makes it possible to reuse existing sample
fetch methods with other matching methods (eg: regex). A "found" matching
method ignores any pattern and only verifies that the required sample was
found (useful for cookies).
2013-04-03 02:13:01 +02:00
Willy Tarreau
d76a98a5fc MEDIUM: acl: move the ->parse, ->match and ->smp fields to acl_expr
We'll need each ACL expression to be able to support its own parse and
match methods, so we're moving these fields to the ACL expression.
2013-04-03 02:13:01 +02:00
Willy Tarreau
d86e29d2a1 CLEANUP: acl: remove unused references to ACL_USE_*
Now that acl->requires is not used anymore, we can remove all references
to it as well as all ACL_USE_* flags.
2013-04-03 02:13:00 +02:00
Willy Tarreau
a91d0a583c MAJOR: acl: convert all ACL requires to SMP use+val instead of ->requires
The ACLs now use the fetch's ->use and ->val to decide upon compatibility
between the place where they are used and where the information are fetched.
The code is capable of reporting warnings about very fine incompatibilities
between certain fetches and an exact usage location, so it is expected that
some new warnings will be emitted on some existing configurations.

Two degrees of detection are provided :
  - detecting ACLs that never match
  - detecting keywords that are ignored

All tests show that this seems to work well, though bugs are still possible.
2013-04-03 02:13:00 +02:00
Willy Tarreau
bf8e251077 MINOR: sample: provide a function to report the name of a sample check point
We need to put names on places where samples are used in order to emit warnings
and errors. Let's do that now.
2013-04-03 02:13:00 +02:00
Willy Tarreau
25320b2906 MEDIUM: proxy: remove acl_requires and just keep a flag "http_needed"
Proxy's acl_requires was a copy of all bits taken from ACLs, but we'll
get rid of ACL flags and only rely on sample fetches soon. The proxy's
acl_requires was only used to allocate an HTTP context when needed, and
was even forced in HTTP mode. So better have a flag which exactly says
what it's supposed to be used for.
2013-04-03 02:13:00 +02:00
Willy Tarreau
4a96bf5a5d CLEANUP: acl: remove ACL hooks which were never used
These hooks, which established the relation between ACL_USE_* and the location
where the ACL were used, were never used because they were superseded with the
sample capabilities. Remove them now.
2013-04-03 02:12:59 +02:00
Willy Tarreau
9baae63d8d MAJOR: acl: remove fetch argument validation from the ACL struct
ACL fetch being inherited from the sample fetch keyword, we don't need
anymore to specify what function to use to validate the fetch arguments.

Note that the job is still done in the ACL parsing code based on elements
from the sample fetch structs.
2013-04-03 02:12:59 +02:00
Willy Tarreau
c48c90dfa5 MAJOR: acl: remove the arg_mask from the ACL definition and use the sample fetch's
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.

The parsing is still performed by the ACL code though.
2013-04-03 02:12:58 +02:00
Willy Tarreau
8ed669b12a MAJOR: acl: make all ACLs reference the fetch function via a sample.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.

In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.

A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
2013-04-03 02:12:58 +02:00
Willy Tarreau
d4c33c8889 MEDIUM: samples: move payload-based fetches and ACLs to their own file
The file acl.c is a real mess, it both contains functions to parse and
process ACLs, and some sample extraction functions which act on buffers.
Some other payload analysers were arbitrarily dispatched to proto_tcp.c.

So now we're moving all payload-based fetches and ACLs to payload.c
which is capable of extracting data from buffers and rely on everything
that is protocol-independant. That way we can safely inflate this file
and only use the other ones when some fetches are really specific (eg:
HTTP, SSL, ...).

As a result of this cleanup, the following new sample fetches became
available even if they're not really useful :

  always_false, always_true, rep_ssl_hello_type, rdp_cookie_cnt,
  req_len, req_ssl_hello_type, req_ssl_sni, req_ssl_ver, wait_end

The function 'acl_fetch_nothing' was wrong and never used anywhere so it
was removed.

The "rdp_cookie" sample fetch used to have a mandatory argument while it
was optional in ACLs, which are supposed to iterate over RDP cookies. So
we're making it optional as a fetch too, and it will return the first one.
2013-04-03 02:12:57 +02:00
Willy Tarreau
434c57c95c MINOR: log: indicate it when some unreliable sample fetches are logged
If a log-format involves some sample fetches that may not be present at
the logging instant, we can now report a warning.

Note that this is done both for log-format and for add-header and carefully
respects the original fetch keyword's capabilities.
2013-04-03 02:12:56 +02:00
Willy Tarreau
80aca90ad2 MEDIUM: samples: use new flags to describe compatibility between fetches and their usages
Samples fetches were relying on two flags SMP_CAP_REQ/SMP_CAP_RES to describe
whether they were compatible with requests rules or with response rules. This
was never reliable because we need a finer granularity (eg: an HTTP request
method needs to parse an HTTP request, and is available past this point).

Some fetches are also dependant on the context (eg: "hdr" uses request or
response depending where it's involved, causing some abiguity).

In order to solve this, we need to precisely indicate in fetches what they
use, and their users will have to compare with what they have.

So now we have a bunch of bits indicating where the sample is fetched in the
processing chain, with a few variants indicating for some of them if it is
permanent or volatile (eg: an HTTP status is stored into the transaction so
it is permanent, despite being caught in the response contents).

The fetches also have a second mask indicating their validity domain. This one
is computed from a conversion table at registration time, so there is no need
for doing it by hand. This validity domain consists in a bitmask with one bit
set for each usage point in the processing chain. Some provisions were made
for upcoming controls such as connection-based TCP rules which apply on top of
the connection layer but before instantiating the session.

Then everywhere a fetch is used, the bit for the control point is checked in
the fetch's validity domain, and it becomes possible to finely ensure that a
fetch will work or not.

Note that we need these two separate bitfields because some fetches are usable
both in request and response (eg: "hdr", "payload"). So the keyword will have
a "use" field made of a combination of several SMP_USE_* values, which will be
converted into a wider list of SMP_VAL_* flags.

The knowledge of permanent vs dynamic information has disappeared for now, as
it was never used. Later we'll probably reintroduce it differently when
dealing with variables. Its only use at the moment could have been to avoid
caching a dynamic rate measurement, but nothing is cached as of now.
2013-04-03 02:12:56 +02:00
Willy Tarreau
e0db1e8946 MEDIUM: acl: remove flag ACL_MAY_LOOKUP which is improperly used
This flag is used on ACL matches that support being looking up patterns
in trees. At the moment, only strings and IPs support tree-based lookups,
but the flag is randomly set also on integers and binary data, and is not
even always set on strings nor IPs.

Better get rid of this mess by only relying on the matching function to
decide whether or not it supports tree-based lookups, this is safer and
easier to maintain.
2013-04-03 02:12:56 +02:00
Lukas Tribus
0999f7662c BUILD: add explicit support for TFO with USE_TFO
TCP Fast Open is supported in server mode since Linux 3.7, but current
libc's don't define TCP_FASTOPEN=23. Introduce the new USE flag USE_TFO
to define it manually in compat.h. Also note this in the TFO related
documentation.
2013-04-02 17:40:43 +02:00
Willy Tarreau
0161d62d23 OPTIM: http: improve branching in chunk size parser
By tweaking a bit some conditions in http_parse_chunk_size(), we could
improve the overall performance in the worst case by 15%.
2013-04-02 02:00:57 +02:00
Willy Tarreau
bf43927cd7 OPTIM: buffer: remove one jump in buffer_count()
We can help gcc build an expression not involving a jump. This function
is used a lot when parsing chunks.
2013-04-02 01:25:57 +02:00
Hiroaki Nakamura
7035132349 MEDIUM: regex: Use PCRE JIT in acl
This is a patch for using PCRE JIT in acl.

I notice regex are used in other places, but they are more complicated
to modify to use PCRE APIs. So I focused to acl in the first try.

BTW, I made a simple benchmark program for PCRE JIT beforehand.
https://github.com/hnakamur/pcre-jit-benchmark

I read the manual for PCRE JIT
http://www.manpagez.com/man/3/pcrejit/

and wrote my benchmark program.
https://github.com/hnakamur/pcre-jit-benchmark/blob/master/test-pcre.c
2013-04-02 00:02:54 +02:00
Willy Tarreau
dad36a3ee3 MAJOR: tools: support environment variables in addresses
Now that all addresses are parsed using str2sa_range(), it becomes easy
to add support for environment variables and use them everywhere an address
is needed. Environment variables are used as $VAR or ${VAR} as in shell.
Any number of variables may compose an address, allowing various fantasies
such as "fd@${FD_HTTP}" or "${LAN_DC1}.1:80".

These ones are usable in logs, bind, servers, peers, stats socket, source,
dispatch, and check address.
2013-03-11 01:30:02 +01:00
Willy Tarreau
24709286fe MEDIUM: tools: support specifying explicit address families in str2sa_range()
This change allows one to force the address family in any address parsed
by str2sa_range() by specifying it as a prefix followed by '@' then the
address. Currently supported address prefixes are 'ipv4@', 'ipv6@', 'unix@'.
This also helps forcing resolving for host names (when getaddrinfo is used),
and force the family of the empty address (eg: 'ipv4@' = 0.0.0.0 while
'ipv6@' = ::).

The main benefits is that unix sockets can now get a local name without
being forced to begin with a slash. This is useful during development as
it is no longer necessary to have stats socket sent to /tmp.
2013-03-10 22:46:55 +01:00
Willy Tarreau
c120c8d347 CLEANUP: minor cleanup in str2sa_range() and str2ip()
Don't use a statically allocated address both for str2ip and str2sa_range,
use the same. The inet and unix code paths have been splitted a little
better to improve readability.
2013-03-10 21:36:31 +01:00
Willy Tarreau
add0ab1975 CLEANUP: tools: remove str2sun() which is not used anymore. 2013-03-08 14:04:54 +01:00
Willy Tarreau
d393a628bb MINOR: tools: prepare str2sa_range() to accept a prefix
We'll need str2sa_range() to support a prefix for unix sockets. Since
we don't always want to use it (eg: stats socket), let's not take it
unconditionally from global but let the caller pass it.
2013-03-08 14:04:54 +01:00
Willy Tarreau
df350f1f48 MINOR: tools: prepare str2sa_range() to return an error message
We'll need str2sa_range() to return address parsing errors if we want to
extend its functionalities. Let's do that now eventhough it's not used
yet.
2013-03-08 14:04:53 +01:00
Emeric Brun
6924ef8b12 BUG/MEDIUM: ssl: ECDHE ciphers not usable without named curve configured.
Fix consists to use prime256v1 as default named curve to init ECDHE ciphers if none configured.
2013-03-06 19:08:26 +01:00
Willy Tarreau
4f4b18b2ec BUILD/MINOR: syscall: add definition of NR_accept4 for ARM
This platform was not covered and older libc do not provide accept4().
2013-03-04 07:38:08 +01:00
Willy Tarreau
b26cc86b1c BUG/MINOR: syscall: fix NR_accept4 system call on sparc/linux
An invalid copy-paste called it NR_splice instead of NR_accept4.
This does not lead to real issues because if this define is used,
then the code cannot compile since NR_accept4 is still missing.
2013-03-04 07:31:08 +01:00
Willy Tarreau
bfd5946aa1 MINOR: ssl: add a global tunable for the max SSL/TLS record size
Add new tunable "tune.ssl.maxrecord".

Over SSL/TLS, the client can decipher the data only once it has received
a full record. With large records, it means that clients might have to
download up to 16kB of data before starting to process them. Limiting the
record size can improve page load times on browsers located over high
latency or low bandwidth networks. It is suggested to find optimal values
which fit into 1 or 2 TCP segments (generally 1448 bytes over Ethernet
with TCP timestamps enabled, or 1460 when timestamps are disabled), keeping
in mind that SSL/TLS add some overhead. Typical values of 1419 and 2859
gave good results during tests. Use "strace -e trace=write" to find the
best value.

This trick was first suggested by Mike Belshe :

   http://www.belshe.com/2010/12/17/performance-and-the-tls-record-size/

Then requested again by Ilya Grigorik who provides some hints here :

   http://ofps.oreilly.com/titles/9781449344764/_transport_layer_security_tls.html#ch04_00000101
2013-02-21 07:53:13 +01:00
Willy Tarreau
d4448bc836 MEDIUM: tools: make str2sa_range support all address syntaxes
Right now we have multiple methods for parsing IP addresses in the
configuration. This is quite painful. This patch aims at adapting
str2sa_range() to make it support all formats, so that the callers
perform the appropriate tests on the return values. str2sa() was
changed to simply return str2sa_range().

The output values are now the following ones (taken from the comment
on top of the function).

  Converts <str> to a locally allocated struct sockaddr_storage *, and a port
  range or offset consisting in two integers that the caller will have to
  check to find the relevant input format. The following format are supported :

    String format           | address |  port  |  low   |  high
     addr                   | <addr>  |   0    |   0    |   0
     addr:                  | <addr>  |   0    |   0    |   0
     addr:port              | <addr>  | <port> | <port> | <port>
     addr:pl-ph             | <addr>  |  <pl>  |  <pl>  |  <ph>
     addr:+port             | <addr>  | <port> |   0    | <port>
     addr:-port             | <addr>  |-<port> | <port> |   0

  The detection of a port range or increment by the caller is made by
  comparing <low> and <high>. If both are equal, then port 0 means no port
  was specified. The caller may pass NULL for <low> and <high> if it is not
  interested in retrieving port ranges.

  Note that <addr> above may also be :
    - empty ("")  => family will be AF_INET and address will be INADDR_ANY
    - "*"         => family will be AF_INET and address will be INADDR_ANY
    - "::"        => family will be AF_INET6 and address will be IN6ADDR_ANY
    - a host name => family and address will depend on host name resolving.
2013-02-20 17:29:30 +01:00
Lukas Tribus
0defb90784 DOC: tfo: bump required kernel to linux-3.7
Support for server side TFO was actually introduced in linux-3.7,
linux-3.6 just has client support.

This patch fixes documentation and a code comment about the
kernel requirement. It also fixes a wrong tfo related code
comment in src/proto_tcp.c.
2013-02-14 00:03:04 +01:00
Simon Horman
a2b9dadedd MEDIUM: checks: Add agent health check
Support a agent health check performed by opening a TCP socket to a
pre-defined port and reading an ASCII string. The string should have one of
the following forms:

* An ASCII representation of an positive integer percentage.
  e.g. "75%"

  Values in this format will set the weight proportional to the initial
  weight of a server as configured when haproxy starts.

* The string "drain".

  This will cause the weight of a server to be set to 0, and thus it will
  not accept any new connections other than those that are accepted via
  persistence.

* The string "down", optionally followed by a description string.

  Mark the server as down and log the description string as the reason.

* The string "stopped", optionally followed by a description string.

  This currently has the same behaviour as down (iii).

* The string "fail", optionally followed by a description string.

  This currently has the same behaviour as down (iii).

A agent health check may be configured using "option lb-agent-chk".
The use of an alternate check-port, used to obtain agent heath check
information described above as opposed to the port of the service,
may be useful in conjunction with this option.

e.g.

    option  lb-agent-chk
    server  http1_1 10.0.0.10:80 check port 10000 weight 100

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-02-13 11:03:28 +01:00
Simon Horman
7d09b9a4df MEDIUM: server: Break out set weight processing code
Break out set weight processing code.
This is in preparation for reusing the code.

Also, remove duplicate check in nested if clauses.
{px->lbprm.algo & BE_LB_PROP_DYN) is checked by
the immediate outer if clause, so there is no need
to check it a second time.

Signed-off-by: Simon Horman <horms@verge.net.au>
2013-02-13 10:53:40 +01:00
Simon Horman
5269cfb458 BUG/MINOR: Correct logic in cut_crlf()
This corrects what appears to be logic errors in cut_crlf().
I assume that the intention of this function is to truncate a
string at the first cr or lf. However, currently lf are ignored.

Also use '\0' instead of 0 as the null character, a cosmetic change.

Cc: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: Simon Horman <horms@verge.net.au>

[WT: this fix may be backported to 1.4 too]
2013-02-13 10:52:40 +01:00
Marc-Antoine Perennou
992709bad0 MEDIUM: New cli option -Ds for systemd compatibility
This patch adds a new option "-Ds" which is exactly like "-D", but instead of
forking n times to get n jobs running and then exiting, prefers to wait for all the
children it just created. With this done, haproxy becomes more systemd-compliant,
without changing anything for other systems.

Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
2013-02-13 10:47:49 +01:00
Willy Tarreau
6cbbdbf3f3 BUG/MEDIUM: log: emit '-' for empty fields again
Commit 2b0108ad accidently got rid of the ability to emit a "-" for
empty log fields. This can happen for captured request and response
cookies, as well as for fetches. Since we don't want to have this done
for headers however, we set the default log method when parsing the
format. It is still possible to force the desired mode using +M/-M.
2013-02-05 18:55:09 +01:00
Emmanuel Hocdet
656233715e MEDIUM: ssl: add bind-option "strict-sni"
This new option ensures that there is no possible fallback to a default
certificate if the client does not provide an SNI which is explicitly
handled by a certificate.
2013-01-24 17:23:33 +01:00
Willy Tarreau
8dc21faaf7 BUG/MINOR: unix: remove the 'level' field from the ux struct
Commit 290e63aa moved the unix parameters out of the global stats socket
to the bind_conf struct. As such the stats admin level was also moved
overthere, but it remained in the stats global section where it was not
used, except by a nasty memcpy() used to initialize the ux struct in the
bind_conf with too large data. Fortunately, the extra data copied were
the previous level over the new level so it did not have any impact, but
it could have been worse.

This bug is 1.5 specific, no backport is needed.

Reported-by: Dinko Korunic <dkorunic@reflected.net>
2013-01-24 16:19:19 +01:00
Willy Tarreau
eab777c32e BUG/MINOR: time: frequency counters are not totally accurate
When a frontend is rate-limited to 1000 connections per second, the
effective rate measured from the client is 999/s, and connections
experience an average response time of 99.5 ms with a standard
deviation of 2 ms.

The reason for this inaccuracy is that when computing frequency
counters, we use one part of the previous value proportional to the
number of milliseconds remaining in the current second. But even the
last millisecond still uses a part of the past value, which is wrong :
since we have a 1ms resolution, the last millisecond must be dedicated
only to filling the current second.

So we slightly adjust the algorithm to use 999/1000 of the past value
during the first millisecond, and 0/1000 of the past value during the
last millisecond.  We also slightly improve the computation by computing
the remaining time instead of the current time in tv_update_date(), so
that we don't have to negate the value in each frequency counter.

Now with the fix, the connection rate measured by both the client and
haproxy is a steady 1000/s, the average response time measured is 99.2ms
and more importantly, the standard deviation has been divided by 3 to
0.6 millisecond.

This fix should also be backported to 1.4 which has the same issue.
2012-12-29 21:50:07 +01:00
Emeric Brun
22890a1225 MINOR: ssl: Setting global tune.ssl.cachesize value to 0 disables SSL session cache. 2012-12-28 14:48:13 +01:00
Willy Tarreau
ccbcc37a01 MEDIUM: http: add support for "http-request tarpit" rule
The "reqtarpit" rule is not very handy to use. Now that we have more
flexibility with "http-request", let's finally make the tarpit rules
usable there.

There are still semantical differences between apply_filters_to_request()
and http_req_get_intercept_rule() because the former updates the counters
while the latter does not. So we currently have almost similar code leafs
for similar conditions, but this should be cleaned up later.
2012-12-28 14:47:19 +01:00
Willy Tarreau
81499eb67d MEDIUM: http: add support for "http-request redirect" rules
These are exactly the same as the classic redirect rules except
that they can be interleaved with other http-request rules for
more flexibility.

The redirect parser should probably be changed to stop at the condition
so that the caller puts its own condition pointer. At the moment, the
redirect rule and condition are parsed at once by build_redirect_rule()
and the condition is assigned to the http_req_rule.
2012-12-28 14:47:19 +01:00
Willy Tarreau
4baae248fc REORG: config: move the http redirect rule parser to proto_http.c
We'll have to use this elsewhere soon, let's move it to the proper
place.
2012-12-28 14:47:19 +01:00
Willy Tarreau
71241abfd3 MINOR: http: move redirect rule processing to its own function
We now have http_apply_redirect_rule() which does all the redirect-specific
job instead of having this inside http_process_req_common().

Also one of the benefit gained from uniformizing this code is that both
keep-alive and close response do emit the PR-- flags. The fix for the
flags could probably be backported to 1.4 though it's very minor.

The previous function http_perform_redirect() was becoming confusing
so it was renamed http_perform_server_redirect() since it only applies
to server-based redirection.
2012-12-28 14:47:19 +01:00
Willy Tarreau
d79a3b248e BUG/MINOR: log: make log-format, unique-id-format and add-header more independant
It happens that all of them call parse_logformat_line() which sets
proxy->to_log with a number of flags affecting the line format for
all three users. For example, having a unique-id specified disables
the default log-format since fe->to_log is tested when the session
is established.

Similarly, having "option logasap" will cause "+" to be inserted in
unique-id or headers referencing some of the fields depending on
LW_BYTES.

This patch first removes most of the dependency on fe->to_log whenever
possible. The first possible cleanup is to stop checking fe->to_log
for being null, considering that it always contains at least LW_INIT
when any such usage is made of the log-format!

Also, some checks are wrong. s->logs.logwait cannot be nulled by
"logwait &= ~LW_*" since LW_INIT is always there. This results in
getting the wrong log at the end of a request or session when a
unique-id or add-header is set, because logwait is still not null
but the log-format is not checked.

Further cleanups are required. Most LW_* flags should be removed or at
least replaced with what they really mean (eg: depend on client-side
connection, depend on server-side connection, etc...) and this should
only affect logging, not other mechanisms.

This patch fixes the default log-format and tries to limit interferences
between the log formats, but does not pretend to do more for the moment,
since it's the most visible breakage.
2012-12-28 09:51:00 +01:00
Willy Tarreau
20b0de56d4 MEDIUM: http: add http-request 'add-header' and 'set-header' to build headers
These two new statements allow to pass information extracted from the request
to the server. It's particularly useful for passing SSL information to the
server, but may be used for various other purposes such as combining headers
together to emulate internal variables.
2012-12-24 15:56:20 +01:00
Willy Tarreau
b83bc1e1c1 MINOR: log: make parse_logformat_string() take a const char *
Sometimes we can't pass a char *, and there is no need for this since we strdup() it.
2012-12-24 12:36:33 +01:00
Willy Tarreau
5c2e198390 MINOR: http: prepare to support more http-request actions
We'll need to support per-action arguments, so we need to have an
"arg" union in http_req_rule.
2012-12-24 12:26:26 +01:00
Willy Tarreau
354898bba9 MINOR: stats: replace STAT_FMT_CSV with STAT_FMT_HTML
We need to switch the default mode if we want to add new output formats
later. Let CSV be the default and HTML be an option.
2012-12-23 21:46:30 +01:00
Willy Tarreau
56adcf2cc9 MINOR: tools: simplify the use of the int to ascii macros
These macros (U2H, U2A, LIM2A, ...) have been used with an explicit
index for the local storage variable, making it difficult to change
log formats and causing a few issues from time to time. Let's have
a single macro with a rotating index so that up to 10 conversions
may be used in a single call.
2012-12-23 21:46:30 +01:00
Willy Tarreau
47ca54505c MINOR: chunks: centralize the trash chunk allocation
At the moment, we need trash chunks almost everywhere and the only
correctly implemented one is in the sample code. Let's move this to
the chunks so that all other places can use this allocator.

Additionally, the get_trash_chunk() function now really returns two
different chunks. Previously it used to always overwrite the same
chunk and point it to a different buffer, which was a bit tricky
because it's not obvious that two consecutive results do alias each
other.
2012-12-23 21:46:07 +01:00
Willy Tarreau
d9bdcd5139 REORG: stats: massive code reorg and cleanup
The dumpstats code looks like a spaghetti plate. Several functions are
supposed to be able to do several things but rely on complex states to
dispatch the work to independant functions. Most of the HTML output is
performed within the switch/case statements of the whole state machine.

Let's clean this up by adding new functions to emit the data and have
a few more iterators to avoid relying on so complex states.

The new stats dump sequence looks like this for CLI and for HTTP :

  cli_io_handler()
      -> stats_dump_sess_to_buffer()      // "show sess"
      -> stats_dump_errors_to_buffer()    // "show errors"
      -> stats_dump_raw_info_to_buffer()  // "show info"
         -> stats_dump_raw_info()
      -> stats_dump_raw_stat_to_buffer()  // "show stat"
         -> stats_dump_csv_header()
         -> stats_dump_proxy()
            -> stats_dump_px_hdr()
            -> stats_dump_fe_stats()
            -> stats_dump_li_stats()
            -> stats_dump_sv_stats()
            -> stats_dump_be_stats()
            -> stats_dump_px_end()

  http_stats_io_handler()
      -> stats_http_redir()
      -> stats_dump_http()              // also emits the HTTP headers
         -> stats_dump_html_head()      // emits the HTML headers
         -> stats_dump_csv_header()     // emits the CSV headers (same as above)
         -> stats_dump_http_info()      // note: ignores non-HTML output
         -> stats_dump_proxy()          // same as above
         -> stats_dump_http_end()       // emits HTML trailer
2012-12-22 20:45:02 +01:00
Willy Tarreau
c83684519b MEDIUM: log: add the ability to include samples in logs
Using %[expression] it becomes possible to make the log engine fetch
some samples from the request or the response and provide them in the
logs. Note that this feature is still limited, it does not yet allow
to apply converters, to limit the output length, nor to specify the
direction which should be fetched when a fetch function works in both
directions.

However it's quite convenient to log SSL information or to include some
information that are used in stick tables.

It is worth noting that this has been done in the generic log format
handler, which means that the same information may be used to build the
unique-id header and to pass the information to a backend server.
2012-12-21 19:24:49 +01:00
Willy Tarreau
2b0108adf6 MINOR: log: add lf_text_len
This function allows to log a text of a specific length.
2012-12-21 19:24:48 +01:00
Willy Tarreau
e7ad4bb2f0 MINOR: samples: add a function to fetch and convert any sample to a string
Any sample type can now easily be converted to a string that can be used
anywhere. This will be used for logging and passing information in headers.
2012-12-21 17:57:24 +01:00
Willy Tarreau
8a3f52fc2e MEDIUM: log-format: make the format parser more robust and more extensible
The log-format parser reached a limit making it hard to add new features.
It also suffers from a weak handling of certain incorrect corner cases,
for example "%{foo}" is emitted as a litteral while syntactically it's an
argument to no variable. Also the argument parser had to redo some of the
job with some cases causing minor memory leaks (eg: ignored args).

This work aims at improving the situation so that slightly better reporting
is possible and that it becomes possible to extend the log format. The code
has a few more states but looks significantly simpler. The parser is now
capable of reporting ignored arguments and truncated lines.
2012-12-20 23:34:20 +01:00
Willy Tarreau
c5259fdc57 MINOR: log: add a tag for amount of bytes uploaded from client to server
For POST, PUT, CONNECT or tunnelled connections, it's annoying not to have
the amount of uploaded bytes in the logs. %U now reports this value.
2012-12-20 15:38:04 +01:00
Willy Tarreau
5fb3803f4b CLEANUP: buffer: use buffer_empty() instead of buffer_len()==0
A few places still made use of buffer_len()==0 to detect an empty
buffer. Use the cleaner and more efficient buffer_empty() instead.
2012-12-17 01:14:49 +01:00
Willy Tarreau
7d28149e92 BUG/MEDIUM: connection: always update connection flags prior to computing polling
stream_int_chk_rcv_conn() did not clear connection flags before updating them. It
is unsure whether this could have caused the stalled transfers that have been
reported since dev15.

In order to avoid such further issues, we now use a simple inline function to do
all the job.
2012-12-17 01:14:25 +01:00
Willy Tarreau
4a29144591 OPTIM: poll: optimize fd management functions for low register count CPUs
Looking at the assembly code that updt_fd() and alloc/release_spec_entry
produce in the polling loops, it's clear that gcc has to recompute pointers
several times in a row because of limited spare registers. By better
grouping adjacent structure updates, we improve the code size by around
60 bytes in the fast path on x86.
2012-12-13 23:34:18 +01:00
Willy Tarreau
20d46a5a95 CLEANUP: session: use an array for the stick counters
The stick counters were in two distinct sets of struct members,
causing some code to be duplicated. Now we use an array, which
enables some processing to be performed in loops. This allowed
the code to be shrunk by 700 bytes.
2012-12-09 15:57:16 +01:00
Willy Tarreau
5d5b5d8eaf MEDIUM: proto_tcp: add support for tracking L7 information
Until now it was only possible to use track-sc1/sc2 with "src" which
is the IPv4 source address. Now we can use track-sc1/sc2 with any fetch
as well as any transformation type. It works just like the "stick"
directive.

Samples are automatically converted to the correct types for the table.

Only "tcp-request content" rules may use L7 information, and such information
must already be present when the tracking is set up. For example it becomes
possible to track the IP address passed in the X-Forwarded-For header.

HTTP request processing now also considers tracking from backend rules
because we want to be able to update the counters even when the request
was already parsed and tracked.

Some more controls need to be performed (eg: samples do not distinguish
between L4 and L6).
2012-12-09 14:08:47 +01:00
Willy Tarreau
ef9a360555 MEDIUM: connection: introduce "struct conn_src" for servers and proxies
Both servers and proxies share a common set of parameters for outgoing
connections, and since they're not stored in a similar structure, a lot
of code is duplicated in the connection setup, which is one sensible
area.

Let's first define a common struct for these settings and make use of it.
Next patches will de-duplicate code.

This change also fixes a build breakage that happens when USE_LINUX_TPROXY
is not set but USE_CTTPROXY is set, which seem to be very unlikely
considering that the issue was introduced almost 2 years ago an never
reported.
2012-12-09 10:04:39 +01:00
Willy Tarreau
02777a1df5 CLEANUP: connection: remove unused server/proxy/task/si_applet declarations
These ones are left-overs from the code before the introduction of
obj_type.
2012-12-08 21:43:36 +01:00
Willy Tarreau
55e4ecd928 MINOR: stats: add a few more information on session dump
We also report fd.spec_p, fd.updt and a few names instead of the values.
2012-12-08 17:48:47 +01:00
Emeric Brun
af9619da3e MEDIUM: ssl: manage shared cache by blocks for huge sessions.
Sessions using client certs are huge (more than 1 kB) and do not fit
in session cache, or require a huge cache.

In this new implementation sshcachesize set a number of available blocks
instead a number of available sessions.

Each block is large enough (128 bytes) to store a simple session (without
client certs).

Huge sessions will take multiple blocks depending on client certificate size.

Note: some unused code for session sync with remote peers was temporarily
      removed.
2012-12-04 10:56:56 +01:00
Willy Tarreau
20879a0233 MEDIUM: connection: add error reporting for the SSL
Get a bit more info in the logs when client-side SSL handshakes fail.
2012-12-03 17:21:52 +01:00
Willy Tarreau
8e3bf699db MEDIUM: connection: add error reporting for the PROXY protocol header
When the PROXY protocol header is expected and fails, leading to an
abort of the incoming connection, we now emit a log message. If option
dontlognull is set and it was just a port probe, then nothing is logged.
2012-12-03 17:21:51 +01:00
Willy Tarreau
0af2912fd1 MEDIUM: connection: add minimal error reporting in logs for incomplete connections
Since the introduction of SSL, it became quite annoying not to get any useful
info in logs about handshake failures. Let's improve reporting for embryonic
sessions by checking a per-connection error code and reporting it into the logs
if an error happens before the session is completely instanciated.

The "dontlognull" option is supported in that if a connection does not talk
before being aborted, nothing will be emitted.

At the moment, only timeouts are considered for SSL and the PROXY protocol,
but next patches will handle more errors.
2012-12-03 15:38:23 +01:00
Willy Tarreau
14cba4b0b1 MEDIUM: connection: add an error code in connections
This will be needed to improve error reporting, especially for SSL.
2012-12-03 14:22:13 +01:00
Emeric Brun
786991e8b7 BUG/MEDIUM: ssl: Fix handshake failure on session resumption with client cert.
Openssl session_id_context was not set on cached sessions so handshake returns an error.
2012-11-26 18:43:21 +01:00
Willy Tarreau
77e3af9e6f MINOR: tcp: add support for the "v4v6" bind option
Commit 9b6700f added "v6only". As suggested by Vincent Bernat, it is
sometimes useful to have the opposite option to force binding to the
two protocols when the system is configured to bind to v6 only by
default. This option does exactly this. v6only still has precedence.
2012-11-24 15:07:23 +01:00
Willy Tarreau
5e16cbc3bd MINOR: stats: report the total number of compressed responses per front/back
Depending on the content-types and accept-encoding fields, some responses
might or might not be compressed. Let's have a counter of the number of
compressed responses and report it in the stats to help improve compression
usage.

Some cosmetic issues were fixed in the CSV output too (missing commas at the
end).
2012-11-24 14:54:13 +01:00
Willy Tarreau
9b6700f673 MINOR: tcp: add support for the "v6only" bind option
This option forces a socket to bind to IPv6 only when it uses the
default address (eg: ":::80").
2012-11-24 12:20:28 +01:00
Willy Tarreau
36fb02c526 BUG/MEDIUM: connection: always disable polling upon error
Commit 0ffde2cc in 1.5-dev13 tried to always disable polling on file
descriptors when errors were encountered. Unfortunately it did not
always succeed in doing so because it relied on detecting polling
changes to disable it. Let's use a dedicated conn_stop_polling()
function that is inconditionally called upon error instead.

This managed to stop a busy loop observed when a health check makes
use of the send-proxy protocol and fails before the connection can
be established.
2012-11-24 11:09:07 +01:00
Willy Tarreau
f0837b259b MEDIUM: tcp: add explicit support for delayed ACK in connect()
Commit 24db47e0 tried to improve support for delayed ACK upon connect
but it was incomplete, because checks with the proxy protocol would
always enable polling for data receive and there was no way of
distinguishing data polling and delayed ack.

So we add a distinct delack flag to the connect() function so that
the caller decides whether or not to use a delayed ack regardless
of pending data (eg: when send-proxy is in use). Doing so covers all
combinations of { (check with data), (sendproxy), (smart-connect) }.
2012-11-24 10:24:27 +01:00
Willy Tarreau
2b199c9ac3 MEDIUM: connection: provide a common conn_full_close() function
Several places got the connection close sequence wrong because it
was not obvious. In practice we always need the same sequence when
aborting, so let's have a common function for this.
2012-11-23 17:32:21 +01:00
Willy Tarreau
5a78f36db3 MAJOR: checks: rework completely bogus state machine
The porting of checks to using connections was totally bogus. Some checks
were considered successful as soon as the connection was established,
regardless of any response. Some errors would be triggered upon recv
if polling was enabled for send or if the send channel was shut down.

Now the behaviour is much better. It would be cleaner to perform the
fd_delete() in wake_srv_chk() and to process failures and timeouts
separately, but this is already a good start.
2012-11-23 12:47:05 +01:00
Willy Tarreau
d3aac7088e CLEANUP: checks: rename some server check flags
Some server check flag names were not properly choosen and cause
analysis trouble, especially the CHK_RUNNING one which does not
mean that a check is running but that the server is running...

Here's the rename :
  CHK_RUNNING -> CHK_PASSED
  CHK_ERROR   -> CHK_FAILED
2012-11-23 11:32:12 +01:00
Willy Tarreau
55058a7c1e MINOR: stats: report HTTP compression stats per frontend and per backend
It was a bit frustrating to have no idea about the bandwidth saved by
HTTP compression. Now we have per-frontend and per-backend stats. The
stats on the HTTP interface are shown in a hover title in the "bytes out"
column if at least something was fed to the compressor. 3 new columns
appeared in the CSV stats output.
2012-11-22 01:07:40 +01:00
Willy Tarreau
193b8c6168 MINOR: http: allow the cookie capture size to be changed
Some users need more than 64 characters to log large cookies. The limit
was set to 63 characters (and not 64 as previously documented). Now it
is possible to change this using the global "tune.http.cookielen" setting
if required.
2012-11-22 00:44:27 +01:00
Willy Tarreau
88c6d81386 MINOR: http: add some debugging functions to pretty-print msg state names
The http_msg_state_str() function reports a string containing the name of
the state passed in argument. This helps while debugging.
2012-11-21 21:50:04 +01:00
William Lallemand
072a2bf537 MINOR: compression: CPU usage limit
New option 'maxcompcpuusage' in global section.
Sets the maximum CPU usage HAProxy can reach before stopping the
compression for new requests or decreasing the compression level of
current requests.  It works like 'maxcomprate' but with the Idle.
2012-11-21 02:15:16 +01:00
William Lallemand
e3a7d99062 MINOR: compression: report zlib memory usage
Show the memory usage and the max memory available for zlib.
The value stored is now the memory used instead of the remaining
available memory.
2012-11-21 02:15:16 +01:00
William Lallemand
8b52bb3878 MEDIUM: compression: use pool for comp_ctx
Use pool for comp_ctx, it is allocated during the comp_algo->init().
The allocation of comp_ctx is accounted for in the zlib_memory_available.
2012-11-21 01:56:47 +01:00
Willy Tarreau
bc174aa144 MINOR: cli: report connection status in "show sess xxx"
Connection flags, targets and transport layers are now reported in
"show sess $PTR", as it is an absolute requirement in debugging.
2012-11-19 16:22:22 +01:00
William Lallemand
bf3ae61789 MEDIUM: compression: don't compress when no data
This patch makes changes in the http_response_forward_body state
machine. It checks if the compress algorithm had consumed data before
swapping the temporary and the input buffer. So it prevents null sized
zlib chunks.
2012-11-19 14:57:29 +01:00
Willy Tarreau
16a2147dfe MEDIUM: adjust the maxaccept per listener depending on the number of processes
global.tune.maxaccept was used for all listeners. This becomes really not
convenient when some listeners are bound to a single process and other ones
are bound to many processes.

Now we change the principle : we count the number of processes a listener
is bound to, and apply the maxaccept either entirely if there is a single
process, or divided by twice the number of processes in order to maintain
fairness.

The default limit has also been increased from 32 to 64 as it appeared that
on small machines, 32 was too low to achieve high connection rates.
2012-11-19 12:39:59 +01:00
Willy Tarreau
37994f034c MINOR: standard: add a simple popcount function
This function returns the number of ones in a word.
2012-11-19 12:12:24 +01:00
Emeric Brun
4f65bff1a5 MINOR: ssl: Add tune.ssl.lifetime statement in global.
Sets the ssl session <lifetime> in seconds. Openssl default is 300 seconds.
2012-11-16 16:47:20 +01:00
Willy Tarreau
fc6c032d8d MEDIUM: global: add support for CPU binding on Linux ("cpu-map")
The new "cpu-map" directive allows one to assign the CPU sets that
a process is allowed to bind to. This is useful in combination with
the "nbproc" and "bind-process" directives.

The support is implicit on Linux 2.6.28 and above.
2012-11-16 16:16:53 +01:00
William Lallemand
ec3e3890f0 BUG/MINOR: compression: deinit zlib only when required
The zlib stream was deinitialized even when the init failed.
2012-11-15 15:42:17 +01:00
Emeric Brun
4663577e24 MINOR: build: allow packagers to specify the ssl cache size
This is done by passing the default value to SSLCACHESIZE in sessions.
User can use tune.sslcachesize to change this value.
By default, it is set to 20000 sessions as openssl internal cache size.
Currently, a session entry size is between 592 and 616 bytes depending on the arch.
2012-11-15 10:52:19 +01:00
Willy Tarreau
3fdb366885 MAJOR: connection: replace struct target with a pointer to an enum
Instead of storing a couple of (int, ptr) in the struct connection
and the struct session, we use a different method : we only store a
pointer to an integer which is stored inside the target object and
which contains a unique type identifier. That way, the pointer allows
us to retrieve the object type (by dereferencing it) and the object's
address (by computing the displacement in the target structure). The
NULL pointer always corresponds to OBJ_TYPE_NONE.

This reduces the size of the connection and session structs. It also
simplifies target assignment and compare.

In order to improve the generated code, we try to put the obj_type
element at the beginning of all the structs (listener, server, proxy,
si_applet), so that the original and target pointers are always equal.

A lot of code was touched by massive replaces, but the changes are not
that important.
2012-11-12 00:42:33 +01:00
Willy Tarreau
128b03c9ab CLEANUP: stream_interface: remove the external task type target
Before connections were introduced, it was possible to connect an
external task to a stream interface. However it was left as an
exercise for the brave implementer to find how that ought to be
done.

The feature was broken since the introduction of connections and
was never fixed since due to lack of users. Better remove this dead
code now.
2012-11-11 23:14:16 +01:00
Willy Tarreau
b31c971bef CLEANUP: channel: remove any reference of the hijackers
Hijackers were functions designed to inject data into channels in the
distant past. They became unused around 1.3.16, and since there has
not been any user of this mechanism to date, it's uncertain whether
the mechanism still works (and it's not really useful anymore). So
better remove it as well as the pointer it uses in the channel struct.
2012-11-11 23:05:39 +01:00
Willy Tarreau
50fc7777c6 MEDIUM: http: refrain from sending "Connection: close" when Upgrade is present
Some servers are not totally HTTP-compliant when it comes to parsing the
Connection header. This is particularly true with WebSocket where it happens
from time to time that a server doesn't support having a "close" token along
with the "Upgrade" token in the Connection header. This broken behaviour has
also been noticed on some clients though the problem is less frequent on the
response path.

Sometimes the workaround consists in enabling "option http-pretend-keepalive"
to leave the request Connection header untouched, but this is not always the
most convenient solution. This patch introduces a new solution : haproxy now
also looks for the "Upgrade" token in the Connection header and if it finds
it, then it refrains from adding any other token to the Connection header
(though "keep-alive" and "close" may still be removed if found). The same is
done for the response headers.

This way, WebSocket much with less changes even when facing non-compliant
clients or servers. At least it fixes the DISCONNECT issue that was seen
on the websocket.org test.

Note that haproxy does not change its internal mode, it just refrains from
adding new tokens to the connection header.
2012-11-11 22:40:00 +01:00
Willy Tarreau
70c6fd82c3 MAJOR: polling: remove unused callbacks from the poller struct
Since no poller uses poller->{set,clr,wai,is_set,rem} anymore, let's
remove them and remove the associated pointer tests in proto/fd.h.
2012-11-11 21:02:34 +01:00
Willy Tarreau
e9f49e78fe MAJOR: polling: replace epoll with sepoll and remove sepoll
Now that all pollers make use of speculative I/O, there is no point
having two epoll implementations, so replace epoll with the sepoll code
and remove sepoll which has just become the standard epoll method.
2012-11-11 20:53:30 +01:00
Willy Tarreau
7f7ad91056 BUILD: stream_interface: remove si_fd() and its references
si_fd() is not used a lot, and breaks builds on OpenBSD 5.2 which
defines this name for its own purpose. It's easy enough to remove
this one-liner function, so let's do it.
2012-11-11 20:53:29 +01:00
Willy Tarreau
09f24569d4 REORG: fd: centralize the processing of speculative events
Speculative events are independant on the poller, so they can be
centralized in fd.c.
2012-11-11 17:45:39 +01:00
Willy Tarreau
6ea20b1acb REORG: fd: move the fd state management from ev_sepoll
ev_sepoll already provides everything needed to manage FD events
by only manipulating the speculative I/O list. Nothing there is
sepoll-specific so move all this to fd.
2012-11-11 17:45:39 +01:00
Willy Tarreau
7be79a41e1 REORG: fd: move the speculative I/O management from ev_sepoll
The speculative I/O will need to be ported to all pollers, so move
this to fd.c.
2012-11-11 17:45:39 +01:00
William Lallemand
d85f917daf MINOR: compression: maximum compression rate limit
This patch adds input and output rate calcutation on the HTTP compresion
feature.

Compression can be limited with a maximum rate value in kilobytes per
second. The rate is set with the global 'maxcomprate' option. You can
change this value dynamicaly with 'set rate-limit http-compression
global' on the UNIX socket.
2012-11-10 17:47:27 +01:00
William Lallemand
f3747837e5 MINOR: compression: tune.comp.maxlevel
This option allows you to set the maximum compression level usable by
the compression algorithm. It affects CPU usage.
2012-11-10 17:47:07 +01:00
Willy Tarreau
037d2c1f8f MAJOR: sepoll: make the poller totally event-driven
At the moment sepoll is not 100% event-driven, because a call to fd_set()
on an event which is already being polled will not change its state.

This causes issues with OpenSSL because if some I/O processing is interrupted
after clearing the I/O event (eg: read all data from a socket, can't put it
all into the buffer), then there is no way to call the SSL_read() again once
the buffer releases some space.

The only real solution is to go 100% event-driven. The principle is to use
the spec list as an event cache and that each time an I/O event is reported
by epoll_wait(), this event is automatically scheduled for addition to the
spec list for future calls until the consumer explicitly asks for polling
or stopping.

Doing this is a bit tricky because sepoll used to provide a substantial
number of optimizations such as event merging. These optimizations have
been maintained : a dedicated update list is affected when events change,
but not the event list, so that updates may cancel themselves without any
side effect such as displacing events. A specific case was considered for
handling newly created FDs as soon as they are detected from within the
poll loop. This ensures that their read or write operation will always be
attempted as soon as possible, thus reducing the number of poll loops and
process_session wakeups. This is especially true for newly accepted fds
which immediately perform their first recv() call.

Two new flags were added to the fdtab[] struct to tag the fact that a file
descriptor already exists in the update list. One flag indicates that a
file descriptor is new and has just been created (fdtab[].new) and the other
one indicates that a file descriptor is already referenced by the update list
(fdtab[].updated). Even if the FD state changes during operations or if the
fd is closed and replaced, it's not an issue because the update flag remains
and is easily spotted during list walks. The flag must absolutely reflect the
presence of the fd in the update list in order to avoid overflowing the update
list with more events than there are distinct fds.

Note that this change also recovers the small performance loss introduced
by its connection counter-part and goes even beyond.
2012-11-10 00:17:27 +01:00
Willy Tarreau
c8dd77fddf MAJOR: connection: remove the CO_FL_CURR_*_POL flag
This is the first step of a series of changes aiming at making the
polling totally event-driven. This first change consists in only
remembering at the connection level whether an FD was enabled or not,
regardless of the fact it was being polled or cached. From now on, an
EAGAIN will always be considered as a change so that the pollers are
able to manage a cache and to flush it based on such events. One of
the noticeable effect is that conn_fd_handler() is called once more
per session (6 instead of 5 min) but other update functions are less
called.

Note that the performance loss caused by this change at the moment is
quite significant, around 2.5%, but the change is needed to have SSL
working correctly in all situations, even when data were read from the
socket and stored in the invisible cache, waiting for some room in the
channel's buffer.
2012-11-09 22:09:33 +01:00
William Lallemand
9d5f5480fd MEDIUM: compression: limit RAM usage
With the global maxzlibmem option, you are able ton control the maximum
amount of RAM usable for HTTP compression.

A test is done before each zlib allocation, if the there isn't available
memory, the test fail and so the zlib initialization, so data won't be
compressed.
2012-11-08 15:23:30 +01:00
William Lallemand
2b50247695 MEDIUM: use pool for zlib
Don't use the zlib allocator anymore, 5 pools are used for the zlib
compression. Their sizes depends of the window size and the memLevel in
deflateInit2.
2012-11-08 15:23:29 +01:00
William Lallemand
a509e4c332 MINOR: compression: memlevel and windowsize
The window size and the memlevel of the zlib are now configurable using
global options tune.zlib.memlevel and tune.zlib.windowsize.

It affects the memory consumption of the zlib.
2012-11-08 15:23:29 +01:00
William Lallemand
08289f12f9 BUILD: remove dependency to zlib.h
The build was dependent of the zlib.h header, regardless of the USE_ZLIB
option. The fix consists of several #ifdef in the source code.

It removes the overhead of the zstream structure in the session when you
don't use the option.
2012-11-05 10:23:16 +01:00
William Lallemand
1c2d622d82 CLEANUP: use struct comp_ctx instead of union
Replace union comp_ctx by struct comp_ctx.

Use struct comp_ctx * in the init/add_data/flush/reset/end prototypes of
compression.h functions.
2012-11-05 10:23:16 +01:00
Willy Tarreau
ed7f836f07 BUG/MINOR: stream_interface: don't loop over ->snd_buf()
It is stupid to loop over ->snd_buf() because the snd_buf() itself already
loops and stops when system buffers are full. But looping again onto it,
we lose the information of the full buffers and perform one useless syscall.

Furthermore, this causes issues when dealing with large uploads while waiting
for a connection to establish, as it can report a server reject of some data
as a connection abort, which is wrong.

1.4 does not have this issue as it loops maximum twice (once for each buffer
half) and exists as soon as system buffers are full. So no backport is needed.
2012-10-29 23:30:33 +01:00
Willy Tarreau
07115412d3 MEDIUM: stick-table: allocate the table key of size buffer size
Keys are copied from samples to stick_table_key. If a key is larger
than the stick_table_key, we have an overflow. In pratice it does not
happen because it requires :
   1) a configuration with tune.bufsize larger than BUFSIZE (common)
   2) a stick-table configured with keys strictly larger than buffers
   3) extraction of data larger than BUFSIZE (eg: using payload())

Points 2 and 3 don't make any sense for a real world configuration. That
said the issue needs be fixed. The solution consists in allocating it the
same size as the global buffer size, just like the samples. This fixes the
issue.
2012-10-29 21:56:59 +01:00
Willy Tarreau
7e2c647ee7 MEDIUM: remove remains of BUFSIZE in HTTP auth and sample conversions
Sample conversions rely on two alternative buffers which were previously
allocated as static bufs of size BUFSIZE. Now they're initialized to the
global buffer size. It was the same for HTTP authentication. Note that it
seems that none of them was prone to any mistake when dealing with the
buffer size, but better stay on the safe side by maintaining the old
assumption that a trash buffer is always "large enough".
2012-10-29 20:44:36 +01:00
Willy Tarreau
19d14ef104 MEDIUM: make the trash be a chunk instead of a char *
The trash is used everywhere to store the results of temporary strings
built out of s(n)printf, or as a storage for a chunk when chunks are
needed.

Using global.tune.bufsize is not the most convenient thing either.

So let's replace trash with a chunk and directly use it as such. We can
then use trash.size as the natural way to get its size, and get rid of
many intermediary chunks that were previously used.

The patch is huge because it touches many areas but it makes the code
a lot more clear and even outlines places where trash was used without
being that obvious.
2012-10-29 16:57:30 +01:00
Willy Tarreau
7780473c3b CLEANUP: replace chunk_printf() with chunk_appendf()
This function's naming was misleading as it is used to append data
at the end of a string, causing some surprizes when used for the
first time!

Add a chunk_printf() function which does what its name suggests.
2012-10-29 16:14:26 +01:00
Willy Tarreau
c26ac9deea MINOR: chunk: add a function to reset a chunk
This is a first step in avoiding to constantly reinitialize chunks.
It replaces the old chunk_reset() which was not properly named as it
used to drop everything and was only used by chunk_destroy(). It has
been renamed chunk_drop().
2012-10-29 13:33:42 +01:00
Yuxans Yao
4e25b015a7 MINOR: log: add '%Tl' to log-format
The '%Tl' is similar to '%T', but using local timezone.
2012-10-29 11:55:26 +01:00
Willy Tarreau
70737d142f MINOR: compression: add an offload option to remove the Accept-Encoding header
This is used when it is desired that backend servers don't compress
(eg: because of buggy implementations).
2012-10-27 01:13:24 +02:00
Willy Tarreau
f2943dccd0 MAJOR: session: detach the connections from the stream interfaces
We will need to be able to switch server connections on a session and
to keep idle connections. In order to achieve this, the preliminary
requirement is that the connections can survive the session and be
detached from them.

Right now they're still allocated at exactly the same place, so when
there is a session, there are always 2 connections. We could soon
improve on this by allocating the outgoing connection only during a
connect().

This current patch touches a lot of code and intentionally does not
change any functionnality. Performance tests show no regression (even
a very minor improvement). The doc has not yet been updated.
2012-10-26 20:15:20 +02:00
Willy Tarreau
c919dc66a3 CLEANUP: remove trashlen
trashlen is a copy of global.tune.bufsize, so let's stop using it as
a duplicate, fall back to the original bufsize, it's less confusing
this way.
2012-10-26 20:04:27 +02:00
Willy Tarreau
422a0a5161 MINOR: tools: add a clear_addr() function to unset an address
This will be used to unset a from address.
2012-10-26 20:04:26 +02:00
Emeric Brun
a7aa309c44 MINOR: ssl: add 'crt' statement on server.
crt: client certificate to send
2012-10-26 15:10:10 +02:00
William Lallemand
82fe75c1a7 MEDIUM: HTTP compression (zlib library support)
This commit introduces HTTP compression using the zlib library.

http_response_forward_body has been modified to call the compression
functions.

This feature includes 3 algorithms: identity, gzip and deflate:

  * identity: this is mostly for debugging, and it was useful for
  developping the compression feature. With Content-Length in input, it
  is making each chunk with the data available in the current buffer.
  With chunks in input, it is rechunking, the output chunks will be
  bigger or smaller depending of the size of the input chunk and the
  size of the buffer. Identity does not apply any change on data.

  * gzip: same as identity, but applying a gzip compression. The data
  are deflated using the Z_NO_FLUSH flag in zlib. When there is no more
  data in the input buffer, it flushes the data in the output buffer
  (Z_SYNC_FLUSH). At the end of data, when it receives the last chunk in
  input, or when there is no more data to read, it writes the end of
  data with Z_FINISH and the ending chunk.

  * deflate: same as gzip, but with deflate algorithm and zlib format.
  Note that this algorithm has ambiguous support on many browsers and
  no support at all from recent ones. It is strongly recommended not
  to use it for anything else than experimentation.

You can't choose the compression ratio at the moment, it will be set to
Z_BEST_SPEED (1), as tests have shown very little benefit in terms of
compression ration when going above for HTML contents, at the cost of
a massive CPU impact.

Compression will be activated depending of the Accept-Encoding request
header. With identity, it does not take care of that header.

To build HAProxy with zlib support, use USE_ZLIB=1 in the make
parameters.

This work was initially started by David Du Colombier at Exceliance.
2012-10-26 02:30:48 +02:00
Willy Tarreau
54d23dfc07 CLEANUP: http: rename HTTP_MSG_DATA_CRLF state
This state's name is confusing as it is only used with chunked encoding
and makes newcomers think it's also related to the content-length. Let's
call it CHUNK_CRLF to clear any doubt on this.
2012-10-26 01:13:52 +02:00
Willy Tarreau
3dd0c4e20e OPTIM: tools: inline hex2i()
This tiny function was not inlined because initially not much used.
However it's been used un the chunk parser for a while and it became
one of the most CPU-cycle eater there. By inlining it, the chunk parser
speed was increased by 74 %. We're almost 3 times faster than original
with just the last 4 commits.
2012-10-26 01:13:24 +02:00
Willy Tarreau
55a6906125 OPTIM: channel: inline channel_forward's fast path
Most calls to channel_forward() are performed with short byte counts and
are already optimized in channel_forward() taking just a few instructions.
Thus it's a waste of CPU cycles to call a function for this, let's just
inline the short byte count case and fall back to the common one for
remaining situations.

Doing so has increased the chunked encoding parser's performance by 12% !
2012-10-26 01:08:01 +02:00
Emeric Brun
a068a2951d MINOR: sample: export 'sample_get_trash_chunk(void)'
This will be used on external fetch modules.
2012-10-22 18:54:24 +02:00
Emeric Brun
07ca496ea9 MINOR: acl: add parse and match primitives to use binary type on ACLs
Binary ACL match patterns can now be entered as hex digit strings.
2012-10-22 18:54:24 +02:00
Willy Tarreau
2e845be249 MEDIUM: sample: pass an empty list instead of a null for fetch args
ACL and sample fetches use args list and it is really not convenient to
check for null args everywhere. Now for empty args we pass a constant
list of end of lists. It will allow us to remove many useless checks.
2012-10-19 19:49:09 +02:00
Willy Tarreau
ad8f8e8ffb MINOR: chunk: provide string compare functions
It's sometimes needed to be able to compare a zero-terminated string with a
chunk, so we now have two functions to do that, one strcmp() equivalent and
one strcasecmp() equivalent.
2012-10-19 15:18:06 +02:00
Willy Tarreau
6c9a3d5585 MEDIUM: ssl: add support for the "npn" bind keyword
The ssl_npn match could not work by itself because clients do not use
the NPN extension unless the server advertises the protocols it supports.
Thanks to Simone Bordet for the explanations on how to get it right.
2012-10-18 19:03:00 +02:00
Willy Tarreau
378e041797 OPTIM: connection: pack the struct target
The struct target contains one int and one pointer, causing it to be
64-bit aligned on 64-bit platforms. By marking it "packed", we can
save 8 bytes in struct connection and as many in struct session on
such platforms.
2012-10-13 14:33:58 +02:00
Willy Tarreau
109e95a1b4 OPTIM: session: reorder struct session fields
A reorering of the struct session fields has increased overall performance
by almost 1% due to better cache usage.
2012-10-13 11:22:24 +02:00
Willy Tarreau
c93f7959e5 CLEANUP: session: remove term_trace which is not used anymore
This field was used to trace precisely where a session was terminated
but it did not survive code rearchitecture and was not used at all
anymore. Let's get rid of it.
2012-10-13 11:10:30 +02:00
Willy Tarreau
0a8535fec8 OPTIM: channel: reorganize struct members to improve cache efficiency
Now that the buffer is moved out of the channel, it is possible to move
the pointer earlier in the struct and reorder some fields. This new
ordering improves overall performance by 2%, mainly saved in the HTTP
parsers and data transfers.
2012-10-13 10:55:22 +02:00
Willy Tarreau
9b28e03b66 MAJOR: channel: replace the struct buffer with a pointer to a buffer
With this commit, we now separate the channel from the buffer. This will
allow us to replace buffers on the fly without touching the channel. Since
nobody is supposed to keep a reference to a buffer anymore, doing so is not
a problem and will also permit some copy-less data manipulation.

Interestingly, these changes have shown a 2% performance increase on some
workloads, probably due to a better cache placement of data.
2012-10-13 09:07:52 +02:00
Willy Tarreau
974ced6305 CLEANUP: channel: use 'chn' instead of 'buf' as local variable names
It's too confusing to see buf->buf everywhere where the first buf is
a channel. Let's fix this now.
2012-10-12 23:11:02 +02:00
Willy Tarreau
394db379eb REORG: http: rename msg->buf to msg->chn since it's a channel
It's extremely confusing to have all those msg->buf->buf everywhere after
the extraction of the buffer from the channel. Let's clean this up.
2012-10-12 22:40:39 +02:00
Willy Tarreau
ffc3fcd6da MEDIUM: log: report SSL ciphers and version in logs using logformat %sslc/%sslv
These two new log-format tags report the SSL protocol version (%sslv) and the
SSL ciphers (%sslc) used for the connection with the client. For instance, to
append these information just after the client's IP/port address information
on an HTTP log line, use the following configuration :

    log-format %Ci:%Cp\ %sslv:%sslc\ [%t]\ %ft\ %b/%s\ %Tq/%Tw/%Tc/%Tr/%Tt\ %st\ %B\ %cc\ \ %cs\ %tsc\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq\ %hr\ %hs\ %{+Q}r

It will report a line such as the following one :

    Oct 12 20:47:30 haproxy[9643]: 127.0.0.1:43602 TLSv1:AES-SHA [12/Oct/2012:20:47:30.303] stick2~ stick2/s1 7/0/12/0/19 200 145 - - ---- 0/0/0/0/0 0/0 "GET /?t=0 HTTP/1.0"
2012-10-12 20:48:51 +02:00
Willy Tarreau
4f65356a22 MINOR: log: make lf_text use a const char *
lf_text() should use a const char * otherwise it makes it more complex
to use data coming from const strings.
2012-10-12 20:30:51 +02:00
Willy Tarreau
93dbc2bc0e MEDIUM: log: add a new LW_XPRT flag to pin the transport layer
This flag will have to be set on log tags which require transport layer
information. They will prevent the conn_xprt_close() call from releasing
the transport layer too early.
2012-10-12 20:30:51 +02:00
Willy Tarreau
1e954913de MEDIUM: connection: add a flag to hold the transport layer
When we start logging SSL information, we need the SSL struct to be
present even past the conn_xprt_close() call. In order to achieve this,
we should use refcounting on the connection and the transport layer. At
the moment it's not worth using plain refcounting as only the logs require
this, so instead of real refcounting we just use a flag which will be set
by the log subsystem when SSL data need to be logged.

What happens then is that the xprt->close() call is ignored and the
transport layer is closed again during session_free(), after the log
line is emitted.
2012-10-12 20:30:50 +02:00
Willy Tarreau
6c03a64978 MEDIUM: connection: always unset the transport layer upon close
When calling conn_xprt_close(), we always clear the transport pointer
so that all transport layers leave the connection in the same state after
a close. This will also make it safer and cheaper to call conn_xprt_close()
multiple times if needed.
2012-10-12 17:03:04 +02:00
Willy Tarreau
773d65f413 MEDIUM: log: suffix the frontend's name with '~' when using SSL
Until now it was not possible to know from the logs whether the incoming
connection was made over SSL or not. In order to address this in the existing
log formats, a new log format %ft was introduced, to log the frontend's name
suffixed with its transport layer. The only transport layer in use right now
is '~' for SSL, so that existing log formats for non-SSL traffic are not
affected at all, and SSL log formats have the frontend's name suffixed with
'~'.

The TCP, HTTP and CLF log format now use %ft instead of %f. This does not
affect existing log formats which still make use of %f however.
2012-10-12 14:56:11 +02:00
Emeric Brun
ef42d9219d MINOR: ssl: add statements 'verify', 'ca-file' and 'crl-file' on servers.
It now becomes possible to verify the server's certificate using the "verify"
directive. This one only supports "none" and "required", as it does not make
much sense to also support "optional" here.
2012-10-12 12:05:15 +02:00
Emeric Brun
f9c5c4701c MINOR: ssl: add statement 'no-tls-tickets' on server side. 2012-10-12 11:48:55 +02:00
Emeric Brun
94324a4c87 MINOR: ssl: move ssl context init for servers from cfgparse.c to ssl_sock.c 2012-10-12 11:37:36 +02:00
Emeric Brun
992adc9210 BUG/MINOR: ssl: Fix issue on server statements 'no-tls*' and 'no-sslv3'
bit field collision with 'force-tlsv*'.
2012-10-12 11:27:57 +02:00
Willy Tarreau
21faa91be6 MINOR: server: add minimal infrastructure to parse keywords
Just like with the "bind" lines, we'll switch the "server" line
parsing to keyword registration. The code is essentially the same
as for bind keywords, with minor changes such as support for the
default-server keywords and support for variable argument count.
2012-10-10 17:42:39 +02:00
Willy Tarreau
7fca87fd9d BUILD: accept4: move the socketcall declaration outside of accept4()
Gcc 4.2.4 breaks on the syscall declared inside the function, move it
outside and declare it static inline.
2012-10-10 17:42:39 +02:00
Willy Tarreau
1bc4aab290 MEDIUM: listener: add support for linux's accept4() syscall
On Linux, accept4() does the same as accept() except that it allows
the caller to specify some flags to set on the resulting socket. We
use this to set the O_NONBLOCK flag and thus to save one fcntl()
call in each connection. The effect is a small performance gain of
around 1%.

The option is automatically enabled when target linux2628 is set, or
when the USE_ACCEPT4 Makefile variable is set. If the libc is too old
to provide the equivalent function, this is automatically detected and
our own function is used instead. In any case it is possible to force
the use of our implementation with USE_MY_ACCEPT4.
2012-10-08 20:11:03 +02:00
Willy Tarreau
1b6c00cb99 BUG/MAJOR: ensure that hdr_idx is always reserved when L7 fetches are used
Baptiste Assmann reported a bug causing a crash on recent versions when
sticking rules were set on layer 7 in a TCP proxy. The bug is easier to
reproduce with the "defer-accept" option on the "bind" line in order to
have some contents to parse when the connection is accepted. The issue
is that the acl_prefetch_http() function called from HTTP fetches relies
on hdr_idx to be preinitialized, which is not the case if there is no L7
ACL.

The solution consists in adding a new SMP_CAP_L7 flag to fetches to indicate
that they are expected to work on L7 data, so that the proxy knows that the
hdr_idx has to be initialized. This is already how ACL and HTTP mode are
handled.

The bug was present since 1.5-dev9.
2012-10-05 22:46:09 +02:00
Emeric Brun
76d8895c49 MINOR: ssl: add defines LISTEN_DEFAULT_CIPHERS and CONNECT_DEFAULT_CIPHERS.
These ones are used to set the default ciphers suite on "bind" lines and
"server" lines respectively, instead of using OpenSSL's defaults. These
are probably mainly useful for distro packagers.
2012-10-05 22:11:15 +02:00
Emeric Brun
8694b9a682 MINOR: ssl: add 'force-sslv3' and 'force-tlsvXX' statements on server
These options force the SSL lib to use the specified protocol when
connecting to a server. They are complentary to no-tlsv*/no-sslv3.
2012-10-05 22:05:04 +02:00
Emeric Brun
2cb7ae5302 MINOR: ssl: add 'force-sslv3' and 'force-tlsvXX' statements on bind.
These options force the SSL lib to use the specified protocol. They
are complentary to no-tlsv*/no-sslv3.
2012-10-05 22:02:42 +02:00
Emeric Brun
8967549d52 MINOR: ssl: use bit fields to store ssl options instead of one int each
Too many SSL options already and some still to come, use a bit field
and get rid of all the integers. No functional change here.
2012-10-05 21:53:59 +02:00
Emeric Brun
fb510ea2b9 MEDIUM: conf: rename 'cafile' and 'crlfile' statements 'ca-file' and 'crl-file'
These names were not really handy.
2012-10-05 21:50:43 +02:00
Emeric Brun
9b3009b440 MEDIUM: conf: rename 'nosslv3' and 'notlsvXX' statements 'no-sslv3' and 'no-tlsvXX'.
These ones were really not easy to read nor write, and become confusing
with the next ones to be added.
2012-10-05 21:47:42 +02:00
Emeric Brun
c8e8d12257 MINOR: ssl: add 'crt-base' and 'ca-base' global statements.
'crt-base' sets root directory used for relative certificates paths.
'ca-base' sets root directory used for relative CAs and CRLs paths.
2012-10-05 21:46:52 +02:00
Emeric Brun
3c4bc6e10a MINOR: ssl: remove prefer-server-ciphers statement and set it as the default on ssl listeners. 2012-10-05 20:02:06 +02:00
Willy Tarreau
1c862c5920 MEDIUM: tcp: enable TCP Fast Open on systems which support it
If TCP_FASTOPEN is defined, then the "tfo" option is supported on
"bind" lines to enable TCP Fast Open (linux >= 3.6).
2012-10-05 16:22:35 +02:00
Willy Tarreau
6c16adc661 MEDIUM: checks: enable the PROXY protocol with health checks
When health checks are configured on a server which has the send-proxy
directive and no "port" nor "addr" settings, the health check connections
will automatically use the PROXY protocol. If "port" or "addr" are set,
the "check-send-proxy" directive may be used to force the protocol.
2012-10-05 00:33:14 +02:00
Willy Tarreau
f4288ee4ba MEDIUM: check: add the ctrl and transport layers in the server check structure
Since it's possible for the checks to use a different protocol or transport layer
than the prod traffic, we need to have them referenced in the server. The
SSL checks are not enabled yet, but the transport layers are completely used.
2012-10-05 00:33:14 +02:00
Willy Tarreau
1ae1b7b53c MEDIUM: checks: use real buffers to store requests and responses
Till now the request was made in the trash and sent to the network at
once, and the response was read into a preallocated char[]. Now we
allocate a full buffer for both the request and the response, and make
use of it.

Some of the operations will probably be replaced later with buffer macros
but the point was to ensure we could migrate to use the data layers soon.

One nice improvement caused by this change is that requests are now formed
at the beginning of the check and may safely be sent in multiple chunks if
needed.
2012-10-05 00:33:14 +02:00
Willy Tarreau
5b3a202f78 REORG: server: move the check-specific parts into a check subsection
The health checks in the servers are becoming a real mess, move them
into their own subsection. We'll soon need to have a struct buffer to
replace the char * as well as check-specific protocol and transport
layers.
2012-10-05 00:33:14 +02:00
Willy Tarreau
5f1504f524 MEDIUM: connection: add a new local send-proxy transport callback
This callback sends a PROXY protocol line on the outgoing connection,
with the local and remote endpoint information. This is used for local
connections (eg: health checks) where the other end needs to have a
valid address and no connection is relayed.
2012-10-05 00:32:35 +02:00
Willy Tarreau
e1e4a61e7a REORG: connection: move the PROXY protocol management to connection.c
It was previously in frontend.c but there is no reason for this anymore
considering that all the information involved is in the connection itself
only. Theorically this should be in the socket layer but we don't have
this yet.
2012-10-05 00:32:33 +02:00
Willy Tarreau
0ffde2cc3f MEDIUM: connection: automatically disable polling on error
We absolutely want to disable FD polling after an error is detected,
otherwise the data layer has to do it and it's far from being obvious
at these layers.

The way we did it was a bit tricky in conn_update_*_polling and
conn_*_polling_changes. However it has almost no impact on performance
and code size both for the fast and slow path.

We'll now be able to remove some flag updates in the stream interface.
2012-10-04 22:26:11 +02:00
Willy Tarreau
2396c1c4a2 MEDIUM: connection: make it possible for data->wake to return an error
Just like ->init(), ->wake() may now be used to return an error and
abort the connection. Currently this is not used but will be with
embryonic sessions.
2012-10-04 22:26:10 +02:00
Willy Tarreau
9e272bf95d MEDIUM: connection: only call the data->wake callback on activity
We now check the connection flags for changes in order not to call the
data->wake callback when there is no activity. Activity means a change
on any of the CO_FL_*_SH, CO_FL_ERROR, CO_FL_CONNECTED, CO_FL_WAIT_CONN*
flags, as well as a call to data->recv or data->send.
2012-10-04 22:26:10 +02:00
Willy Tarreau
f3a6d7e115 MEDIUM: connection: reorganize connection flags
The connection flags have progressively been added one after the other
and were not very well organized. Some of them are often used together
and a number of operations are performed on the DATA/SOCK ENA/POL flags.
Thus, they have been reorganized so that flags that work together are
close to each other (allows immediate operands on ARM) and that polling
changes can be detected with fewer operations using a simple shift and
xor. The handshakes are now the last ones so that it will be easier to
add new ones after without risking a collision. All activity-related
flags are also grouped together.
2012-10-04 22:26:10 +02:00
Willy Tarreau
071e137ec2 MEDIUM: connection: use a generic data-layer init() callback
The generic data-layer init callback is now used after the transport
layer is complete and before calling the data layer recv/send callbacks.

This allows the session to switch from the embryonic session data layer
to the complete stream interface data layer, by making conn_session_complete()
the data layer's init callback.

It sill looks awkwards that the init() callback must be used opon error,
but except by adding yet another one, it does not seem to be mergeable
into another function (eg: it should probably not be merged with ->wake
to avoid unneeded calls during the handshake, though semantically that
would make sense).
2012-10-04 22:26:10 +02:00
Willy Tarreau
f4e114fe54 MINOR: connection: add an init callback to the data_cb struct
This callback is used to initialize the data layer.
2012-10-04 22:26:10 +02:00
Willy Tarreau
bd99aab91f MINOR: connection: split conn_prepare() in two functions
We'll also need a function to takeover an existing connection without
reinitializing it. The same will be needed at the stream interface level.
2012-10-04 22:26:10 +02:00
Willy Tarreau
4aa3683b2d MINOR: connection: provide a generic data layer wakeup callback
Instead of calling conn_notify_si() from the connection handler, we
now call data->wake(), which will allow us to use a different callback
with health checks.

Note that we still rely on a flag in order to decide whether or not
to call this function. The reason is that with embryonic sessions,
the callback is already initialized to si_conn_cb without the flag,
and we can't call the SI notify function in the leave path before
the stream interface is initialized.

This issue should be addressed by involving a different data_cb for
embryonic sessions and for stream interfaces, that would be changed
during session_complete() for the final data_cb.
2012-10-04 22:26:10 +02:00
Willy Tarreau
74beec32a5 REORG: connection: rename app_cb "data"
Now conn->data will designate the data layer which is the client for
the transport layer. In practice it's the stream interface and will
soon also be the health checks.
2012-10-04 22:26:10 +02:00
Willy Tarreau
f7bc57ca6e REORG: connection: rename the data layer the "transport layer"
While working on the changes required to make the health checks use the
new connections, it started to become obvious that some naming was not
logical at all in the connections. Specifically, it is not logical to
call the "data layer" the layer which is in charge for all the handshake
and which does not yet provide a data layer once established until a
session has allocated all the required buffers.

In fact, it's more a transport layer, which makes much more sense. The
transport layer offers a medium on which data can transit, and it offers
the functions to move these data when the upper layer requests this. And
it is the upper layer which iterates over the transport layer's functions
to move data which should be called the data layer.

The use case where it's obvious is with embryonic sessions : an incoming
SSL connection is accepted. Only the connection is allocated, not the
buffers nor stream interface, etc... The connection handles the SSL
handshake by itself. Once this handshake is complete, we can't use the
data functions because the buffers and stream interface are not there
yet. Hence we have to first call a specific function to complete the
session initialization, after which we'll be able to use the data
functions. This clearly proves that SSL here is only a transport layer
and that the stream interface constitutes the data layer.

A similar change will be performed to rename app_cb => data, but the
two could not be in the same commit for obvious reasons.
2012-10-04 22:26:09 +02:00
Willy Tarreau
8c89c2059f MINOR: buffers: add a few functions to write chars, strings and blocks
bo_put{chr,blk,str,chk} are used to write data on the output of a buffer.
Output is truncated if the buffer is not large enough.
2012-10-04 22:26:09 +02:00
Willy Tarreau
8113a5d78f BUG/MINOR: config: use a copy of the file name in proxy configurations
Each proxy contains a reference to the original config file and line
number where it was declared. The pointer used is just a reference to
the one passed to the function instead of being duplicated. The effect
is that it is not valid anymore at the end of the parsing and that all
proxies will be enumerated as coming from the same file on some late
configuration errors. This may happen for exmaple when reporting SSL
certificate issues.

By copying using strdup(), we avoid this issue.

1.4 has the same issue, though no report of the proxy file name is done
out of the config section. Anyway a backport is recommended to ease
post-mortem analysis.
2012-10-04 08:13:32 +02:00
Willy Tarreau
d1a33e35fb BUG/MEDIUM: proxy: must not try to stop disabled proxies upon reload
Herv Commowick reported an issue : haproxy dies in a segfault during a
soft restart if it tries to pause a disabled proxy. This is because disabled
proxies have no management task so we must not wake the task up. This could
easily remain unnoticed since the old process was expected to go away, so
having it go away faster was not really troubling. However, with sync peers,
it is obvious that there is no peer sync during this reload.

This issue has been introduced in 1.5-dev7 with the removal of the
maintain_proxies() function. No backport is needed.
2012-10-04 00:20:55 +02:00
Emeric Brun
2d0c482682 MINOR: ssl: add statement 'no-tls-tickets' on bind to disable stateless session resumption
Disables the stateless session resumption (RFC 5077 TLS Ticket extension)
and force to use stateful session resumption.
Stateless session resumption is more expensive in CPU usage.
2012-10-02 16:05:33 +02:00
Emeric Brun
c0ff4924c0 MINOR: ssl : add statements 'notlsv11' and 'notlsv12' and rename 'notlsv1' to 'notlsv10'.
This is because "notlsv1" used to disable TLSv1.0 only and had no effect
on v1.1/v1.2. so better have an option for each version. This applies both
to "bind" and "server" statements.
2012-10-02 08:34:38 +02:00
Emeric Brun
9faf071acb MINOR: ssl: add build param USE_PRIVATE_CACHE to build cache without shared memory
It removes dependencies with futex or mutex but ssl performances decrease
using nbproc > 1 because switching process force session renegotiation.

This can be useful on small systems which never intend to run in multi-process
mode.
2012-10-02 08:34:38 +02:00
Emeric Brun
4b3091e54e MINOR: ssl: disable shared memory and locks on session cache if nbproc == 1
We don't needa to lock the memory when there is a single process. This can
make a difference on small systems where locking is much more expensive than
just a test.
2012-10-02 08:34:38 +02:00
Emeric Brun
81c00f0a7a MINOR: ssl: add ignore verify errors options
Allow to ignore some verify errors and to let them pass the handshake.

Add option 'crt-ignore-err <list>'
Ignore verify errors at depth == 0 (client certificate)
<list> is string 'all' or a comma separated list of verify error IDs
(see http://www.openssl.org/docs/apps/verify.html)

Add option 'ca-ignore-err <list>'
Same as 'crt-ignore-err' for all depths > 0 (CA chain certs)

Ex ignore all errors on CA and expired or not-yet-valid errors
on client certificate:

bind 0.0.0.0:443 ssl crt crt.pem verify required
 cafile ca.pem ca-ignore-err all crt-ignore-err 10,9
2012-10-02 08:32:50 +02:00
Emeric Brun
d94b3fe98f MEDIUM: ssl: add client certificate authentication support
Add keyword 'verify' on bind:
'verify none': authentication disabled (default)
'verify optional': accept connection without certificate
                   and process a verify if the client sent a certificate
'verify required': reject connection without certificate
                   and process a verify if the client send a certificate

Add keyword 'cafile' on bind:
'cafile <path>' path to a client CA file used to verify.
'crlfile <path>' path to a client CRL file used to verify.
2012-10-02 08:04:49 +02:00
Emeric Brun
2b58d040b6 MINOR: ssl: add elliptic curve Diffie-Hellman support for ssl key generation
Add 'ecdhe' on 'bind' statement: to set named curve used to generate ECDHE keys
(ex: ecdhe secp521r1)
2012-10-02 08:03:21 +02:00
Willy Tarreau
cd379950a7 MINOR: connection: add a pointer to the connection owner
This will be needed to find the stream interface from the connection
once they're detached, but in the more immediate term, we'll need this
for health checks since they don't use a stream interface.
2012-09-28 00:01:22 +02:00
Willy Tarreau
dda5e7c986 CLEANUP: connection: offer conn_prepare() to set up a connection
This will be used by checks as well as stream interfaces.
2012-09-24 22:49:06 +02:00
Willy Tarreau
c53d42256d MEDIUM: stats: remove the stats_sock struct from the global struct
Now the stats socket is allocated when the 'stats socket' line is parsed,
and assigned using the standard str2listener(). This has two effects :
  - more than one stats socket can now be declared
  - stats socket now support protocols other than UNIX

The next step is to remove the duplicate bind config parsing.
2012-09-24 10:53:16 +02:00
Willy Tarreau
4fbb2285e2 MINOR: config: make str2listener() use memprintf() to report errors.
This will make it possible to use the function for other listening
sockets.
2012-09-24 10:53:16 +02:00
Willy Tarreau
eb6cead1de MINOR: standard: make memprintf() support a NULL destination
Doing so removes many checks that were systematically made because
the callees don't know if the caller passed a valid pointer.
2012-09-24 10:53:16 +02:00
Willy Tarreau
ce39bfb7c4 BUG: backend: balance hdr was broken since 1.5-dev11
Alex Markham reported and diagnosed a bug appearing on 1.5-dev11,
causing a crash on x86_64 when header hashing is used. The cause is
a missing (int) cast causing a negative offset to appear positive
and the resulting pointer to go out of bounds.

The crash is not possible anymore since 1.5-dev12 because a second
bug caused the negative sign to disappear so the pointer is always
within range but always wrong, so balance hdr() never works anymore.

This fix restores the correct behaviour and ensures the sign is
correct.
2012-09-22 18:36:29 +02:00
Willy Tarreau
290e63aa87 REORG: listener: move unix perms from the listener to the bind_conf
Unix permissions are per-bind configuration line and not per listener,
so let's concretize this in the way the config is stored. This avoids
some unneeded loops to set permissions on all listeners.

The access level is not part of the unix perms so it has been moved
away. Once we can use str2listener() to set all listener addresses,
we'll have a bind keyword parser for this one.
2012-09-20 18:07:14 +02:00
Willy Tarreau
4348fad1c1 MAJOR: listeners: use dual-linked lists to chain listeners with frontends
Navigating through listeners was very inconvenient and error-prone. Not to
mention that listeners were linked in reverse order and reverted afterwards.
In order to definitely get rid of these issues, we now do the following :
  - frontends have a dual-linked list of bind_conf
  - frontends have a dual-linked list of listeners
  - bind_conf have a dual-linked list of listeners
  - listeners have a pointer to their bind_conf

This way we can now navigate from anywhere to anywhere and always find the
proper bind_conf for a given listener, as well as find the list of listeners
for a current bind_conf.
2012-09-20 16:48:07 +02:00
Willy Tarreau
28a47d6408 MINOR: config: pass the file and line to config keyword parsers
This will be needed when we need to create bind config settings.
2012-09-18 20:02:48 +02:00
Willy Tarreau
51fb7651c4 MINOR: listener: add a scope field in the bind keyword lists
This scope is used to report what the keywords are used for (eg: TCP,
UNIX, ...). It is now reported by bind_dump_kws().
2012-09-18 18:27:14 +02:00
Willy Tarreau
8638f4850f MEDIUM: config: enumerate full list of registered "bind" keywords upon error
When an unknown "bind" keyword is detected, dump the list of all
registered keywords. Unsupported default alternatives are also reported
as "not supported".
2012-09-18 18:27:14 +02:00
Willy Tarreau
79eeafacb4 MEDIUM: move bind SSL parsing to ssl_sock
Registering new SSL bind keywords was not particularly handy as it required
many #ifdef in cfgparse.c. Now the code has moved to ssl_sock.c which calls
a register function for all the keywords.

Error reporting was also improved by this move, because the called functions
build an error message using memprintf(), which can span multiple lines if
needed, and each of these errors will be displayed indented in the context of
the bind line being processed. This is important when dealing with certificate
directories which can report multiple errors.
2012-09-18 16:20:01 +02:00
Willy Tarreau
269826659d MEDIUM: listener: add a minimal framework to register "bind" keyword options
With the arrival of SSL, the "bind" keyword has received even more options,
all of which are processed in cfgparse in a cumbersome way. So it's time to
let modules register their own bind options. This is done very similarly to
the ACLs with a small difference in that we make the difference between an
unknown option and a known, unimplemented option.
2012-09-15 22:33:08 +02:00
Willy Tarreau
88500de69e CLEANUP: listener: remove unused conf->file and conf->line
These ones are already in bind_conf.
2012-09-15 22:29:33 +02:00
Willy Tarreau
2a65ff014e MEDIUM: config: replace ssl_conf by bind_conf
Some settings need to be merged per-bind config line and are not necessarily
SSL-specific. It becomes quite inconvenient to have this ssl_conf SSL-specific,
so let's replace it with something more generic.
2012-09-15 22:29:33 +02:00
Willy Tarreau
d1d5454180 REORG: split "protocols" files into protocol and listener
It was becoming confusing to have protocols and listeners in the same
files, split them.
2012-09-15 22:29:32 +02:00
Willy Tarreau
21c705b0f8 MINOR: config: add a function to indent error messages
Bind parsers may return multiple errors, so let's make use of a new function
to re-indent multi-line error messages so that they're all reported in their
context.
2012-09-15 22:29:27 +02:00
Willy Tarreau
2e1dca8f52 MEDIUM: http: add "redirect scheme" to ease HTTP to HTTPS redirection
For instance :

   redirect scheme https if !{ is_ssl }
2012-09-12 08:43:15 +02:00
Emeric Brun
fc0421fde9 MEDIUM: ssl: add support for SNI and wildcard certificates
A side effect of this change is that the "ssl" keyword on "bind" lines is now
just a boolean and that "crt" is needed to designate certificate files or
directories.

Note that much refcounting was needed to have the free() work correctly due to
the number of cert aliases which can make a context be shared by multiple names.
2012-09-10 09:27:02 +02:00
Willy Tarreau
f5ae8f7637 MEDIUM: config: centralize handling of SSL config per bind line
SSL config holds many parameters which are per bind line and not per
listener. Let's use a per-bind line config instead of having it
replicated for each listener.

At the moment we only do this for the SSL part but this should probably
evolved to handle more of the configuration and maybe even the state per
bind line.
2012-09-08 08:31:50 +02:00
Willy Tarreau
403edff4b8 MEDIUM: config: implement maxsslconn in the global section
SSL connections take a huge amount of memory, and unfortunately openssl
does not check malloc() returns and easily segfaults when too many
connections are used.

The only solution against this is to provide a global maxsslconn setting
to reject SSL connections above the limit in order to avoid reaching
unsafe limits.
2012-09-06 12:10:43 +02:00
David BERARD
e566ecbea8 MEDIUM: ssl: add support for prefer-server-ciphers option
I wrote a small path to add the SSL_OP_CIPHER_SERVER_PREFERENCE OpenSSL option
to frontend, if the 'prefer-server-ciphers' keyword is set.

Example :
	bind 10.11.12.13 ssl /etc/haproxy/ssl/cert.pem ciphers RC4:HIGH:!aNULL:!MD5 prefer-server-ciphers

This option mitigate the effect of the BEAST Attack (as I understand), and it
equivalent to :
	- Apache HTTPd SSLHonorCipherOrder option.
	- Nginx ssl_prefer_server_ciphers option.

[WT: added a test for the support of the option]
2012-09-04 15:35:32 +02:00
Willy Tarreau
ff9f7698fc BUILD: fix build error without SSL (ssl_cert)
One last-minute optimization broke the build without SSL support.
Move ssl_cert out of the #ifdef/#endif and it's OK.
2012-09-04 15:13:20 +02:00
Willy Tarreau
d50265aa0e BUILD: include sys/socket.h to fix build failure on FreeBSD
Joris Dedieu reported that include/common/standard.h needs this.
2012-09-04 14:18:33 +02:00
Willy Tarreau
783f25800c BUILD: http: rename error_message http_error_message to fix conflicts on RHEL
Duncan Hall reported a build issue on CentOS where error_message conflicts
with another system declaration when SSL is enabled. Rename the function.
2012-09-04 12:19:04 +02:00
Willy Tarreau
c230b8bfb6 MEDIUM: config: add "nosslv3" and "notlsv1" on bind and server lines
This is aimed at disabling SSLv3 and TLSv1 respectively. SSLv2 is always
disabled. This can be used in some situations where one version looks more
suitable than the other.
2012-09-03 23:55:16 +02:00
Willy Tarreau
d7aacbffcb MEDIUM: config: add a "ciphers" keyword to set SSL cipher suites
This is supported for both servers and listeners. The cipher suite
simply follows the "ciphers" keyword.
2012-09-03 23:43:25 +02:00
Emeric Brun
fc32acafcd MINOR: ssl add global setting tune.sslcachesize to set SSL session cache size.
This new global setting allows the user to change the SSL cache size in
number of sessions. It defaults to 20000.
2012-09-03 22:36:33 +02:00
Emeric Brun
3e541d1c03 MEDIUM: ssl: add shared memory session cache implementation.
This SSL session cache was developped at Exceliance and is the same that
was proposed for stunnel and stud. It makes use of a shared memory area
between the processes so that sessions can be handled by any process. It
is only useful when haproxy runs with nbproc > 1, but it does not hurt
performance at all with nbproc = 1. The aim is to totally replace OpenSSL's
internal cache.

The cache is optimized for Linux >= 2.6 and specifically for x86 platforms.
On Linux/x86, it makes use of futexes for inter-process locking, with some
x86 assembly for the locked instructions. On other architectures, GCC
builtins are used instead, which are available starting from gcc 4.1.

On other operating systems, the locks fall back to pthread mutexes so
libpthread is automatically linked. It is not recommended since pthreads
are much slower than futexes. The lib is only linked if SSL is enabled.
2012-09-03 22:36:33 +02:00
Emeric Brun
e1f38dbb44 MEDIUM: ssl: protect against client-initiated renegociation
CVE-2009-3555 suggests that client-initiated renegociation should be
prevented in the middle of data. The workaround here consists in having
the SSL layer notify our callback about a handshake occurring, which in
turn causes the connection to be marked in the error state if it was
already considered established (which means if a previous handshake was
completed). The result is that the connection with the client is immediately
aborted and any pending data are dropped.
2012-09-03 22:03:17 +02:00
Emeric Brun
01f8e2f61b MEDIUM: config: add support for the 'ssl' option on 'server' lines
This option currently takes no option and simply turns SSL on for all
connections going to the server. It is likely that more options will
be needed in the future.
2012-09-03 22:02:21 +02:00
Emeric Brun
6e159299f1 MEDIUM: config: add the 'ssl' keyword on 'bind' lines
"bind" now supports "ssl" followed by a PEM cert+key file name.
2012-09-03 20:49:14 +02:00
Emeric Brun
4659195e31 MEDIUM: ssl: add new files ssl_sock.[ch] to provide the SSL data layer
This data layer supports socket-to-buffer and buffer-to-socket operations.
No sock-to-pipe nor pipe-to-sock functions are provided, since splicing does
not provide any benefit with data transformation. At best it could save a
memcpy() and avoid keeping a buffer allocated but that does not seem very
useful.

An init function and a close function are provided because the SSL context
needs to be allocated/freed.

A data-layer shutw() function is also provided because upon successful
shutdown, we want to store the SSL context in the cache in order to reuse
it for future connections and avoid a new key generation.

The handshake function is directly called from the connection handler.
At this point it is not certain whether this will remain this way or
if a new ->handshake callback will be added to the data layer so that
the connection handler doesn't care about SSL.

The sock-to-buf and buf-to-sock functions are all capable of enabling
the SSL handshake at any time. This also implies polling in the opposite
direction to what was expected. The upper layers must take that into
account (it is OK right now with the stream interface).
2012-09-03 20:49:14 +02:00
Emeric Brun
7dd0e505ca MEDIUM: connection: add a new handshake flag for SSL (CO_FL_SSL_WAIT_HS).
This flag is part of the CO_FL_HANDSHAKE family since the SSL handshake
may appear at any time.
2012-09-03 20:49:14 +02:00
Emeric Brun
c6545acee0 MINOR: server: add SSL context to servers if USE_OPENSSL is defined
This will be needed to accept outgoing SSL connections.
2012-09-03 20:49:14 +02:00
Emeric Brun
0b8d4d9372 MINOR: protocol: add SSL context to listeners if USE_OPENSSL is defined
This will be needed to accept incoming SSL connections.
2012-09-03 20:49:14 +02:00
Willy Tarreau
dd2f85eb3b CLEANUP: includes: fix includes for a number of users of fd.h
It appears that fd.h includes a number of unneeded files and was
included from standard.h, and as such served as an intermediary
to provide almost everything to everyone.

By removing its useless includes, a long dependency chain broke
but could easily be fixed.
2012-09-03 20:49:14 +02:00
Willy Tarreau
45dab73788 CLEANUP: fdtab: flatten the struct and merge the spec struct with the rest
The "spec" sub-struct was using 8 bytes for only 5 needed. There is no
reason to keep it as a struct, it doesn't bring any value. By flattening
it, we can merge the single byte with the next single byte, resulting in
an immediate saving of 4 bytes (20%). Interestingly, tests have shown a
steady performance gain of 0.6% after this change, which can possibly be
attributed to a more cache-line friendly struct.
2012-09-03 20:49:14 +02:00
Willy Tarreau
40ff59d820 CLEANUP: fd: remove fdtab->flags
These flags were added for TCP_CORK. They were only set at various places
but never checked by any user since TCP_CORK was replaced with MSG_MORE.
Simply get rid of this now.
2012-09-03 20:49:14 +02:00
Willy Tarreau
56a77e5933 MEDIUM: connection: complete the polling cleanups
I/O handlers now all use __conn_{sock,data}_{stop,poll,want}_* instead
of returning dummy flags. The code has become slightly simpler because
some tricks such as the MIN_RET_FOR_READ_LOOP are not needed anymore,
and the data handlers which switch to a handshake handler do not need
to disable themselves anymore.
2012-09-03 20:47:35 +02:00
Willy Tarreau
e9dfa79a75 MAJOR: connection: rearrange the polling flags.
Polling flags were set for data and sock layer, but while this does make
sense for the ENA flag, it does not for the POL flag which translates the
detection of an EAGAIN condition. So now we remove the {DATA,SOCK}_POL*
flags and instead introduce two new layer-independant flags (WANT_RD and
WANT_WR). These flags are only set when an EAGAIN is encountered so that
polling can be enabled.

In order for these flags to have any meaning they are not persistent and
have to be cleared by the connection handler before calling the I/O and
data callbacks. For this reason, changes detection has been slightly
improved. Instead of comparing the WANT_* flags with CURR_*_POL, we only
check if the ENA status changes, or if the polling appears, since we don't
want to detect the useless poll to ena transition. Tests show that this
has eliminated one useless call to __fd_clr().

Finally the conn_set_polling() function which was becoming complex and
required complex operations from the caller was split in two and replaced
its two only callers (conn_update_data_polling and conn_update_sock_polling).
The two functions are now much smaller due to the less complex conditions.
Note that it would be possible to re-merge them and only pass a mask but
this does not appear much interesting.
2012-09-03 20:47:35 +02:00
Willy Tarreau
74172ff9c3 CLEANUP: frontend: remove the old proxy protocol decoder
This one used to rely on a stream analyser which was inappropriate.
It's not used anymore.
2012-09-03 20:47:35 +02:00
Willy Tarreau
22cda21ad5 MAJOR: connection: make the PROXY decoder a handshake handler
The PROXY protocol is now decoded in the connection before other
handshakes. This means that it may be extracted from a TCP stream
before SSL is decoded from this stream.
2012-09-03 20:47:35 +02:00
Willy Tarreau
2542b53b19 MAJOR: session: introduce embryonic sessions
When an incoming connection request is accepted, a connection
structure is needed to store its state. However we don't want to
fully initialize a session until the data layer is about to be
ready.

As long as the connection is physically stored into the session,
it's not easy to split both allocations.

As such, we only initialize the minimum requirements of a session,
which results in what we call an embryonic session. Then once the
data layer is ready, we can complete the function's initialization.

Doing so avoids buffers allocation and ensures that a session only
sees ready connections.

The frontend's client timeout is used as the handshake timeout. It
is likely that another timeout will be used in the future.
2012-09-03 20:47:35 +02:00
Willy Tarreau
15678efc45 MEDIUM: connection: add an ->init function to data layer
SSL need to initialize the data layer before proceeding with data. At
the moment, this data layer is automatically initialized from itself,
which will not be possible once we extract connection from sessions
since we'll only create the data layer once the handshake is finished.

So let's have the application layer initialize the data layer before
using it.
2012-09-03 20:47:34 +02:00
Willy Tarreau
64ee491309 MINOR: tcp: replace tcp_src_to_stktable_key with addr_to_stktable_key
Make it more obvious that this function does not depend on any knowledge
of the session. This is important to plan for TCP rules that can run on
connection without any initialized session yet.
2012-09-03 20:47:34 +02:00
Willy Tarreau
14f8e86da5 MEDIUM: proto_tcp: remove any dependence on stream_interface
The last uses of the stream interfaces were in tcp_connect_server() and
could easily and more appropriately be moved to its callers, si_connect()
and connect_server(), making a lot more sense.

Now the function should theorically be usable for health checks.

It also appears more obvious that the file is split into two distinct
parts :
  - the protocol layer used at the connection level
  - the tcp analysers executing tcp-* rules and their samples/acls.
2012-09-03 20:47:34 +02:00
Willy Tarreau
93b0f4f6c6 MEDIUM: stream_interface: remove CAP_SPLTCP/CAP_SPLICE flags
These ones are implicitly handled by the connection's data layer, no need
to rely on them anymore and reaching them maintains undesired dependences
on stream-interface.
2012-09-03 20:47:34 +02:00
Willy Tarreau
986a9d2d12 MAJOR: connection: move the addr field from the stream_interface
We need to have the source and destination addresses in the connection.
They were lying in the stream interface so let's move them. The flags
SI_FL_FROM_SET and SI_FL_TO_SET have been moved as well.

It's worth noting that tcp_connect_server() almost does not use the
stream interface anymore except for a few flags.

It has been identified that once we detach the connection from the SI,
it will probably be needed to keep a copy of the server-side addresses
in the SI just for logging purposes. This has not been implemented right
now though.
2012-09-03 20:47:34 +02:00
Willy Tarreau
3cefd521fa REORG: connection: move the target pointer from si to connection
The target is per connection and is directly used by the connection, so
we need it there. It's not needed anymore in the SI however.
2012-09-03 20:47:34 +02:00
Willy Tarreau
8263d2b259 CLEANUP: channel: use "channel" instead of "buffer" in function names
This is a massive rename of most functions which should make use of the
word "channel" instead of the word "buffer" in their names.

In concerns the following ones (new names) :

unsigned long long channel_forward(struct channel *buf, unsigned long long bytes);
static inline void channel_init(struct channel *buf)
static inline int channel_input_closed(struct channel *buf)
static inline int channel_output_closed(struct channel *buf)
static inline void channel_check_timeouts(struct channel *b)
static inline void channel_erase(struct channel *buf)
static inline void channel_shutr_now(struct channel *buf)
static inline void channel_shutw_now(struct channel *buf)
static inline void channel_abort(struct channel *buf)
static inline void channel_stop_hijacker(struct channel *buf)
static inline void channel_auto_connect(struct channel *buf)
static inline void channel_dont_connect(struct channel *buf)
static inline void channel_auto_close(struct channel *buf)
static inline void channel_dont_close(struct channel *buf)
static inline void channel_auto_read(struct channel *buf)
static inline void channel_dont_read(struct channel *buf)
unsigned long long channel_forward(struct channel *buf, unsigned long long bytes)

Some functions provided by channel.[ch] have kept their "buffer" name because
they are really designed to act on the buffer according to some information
gathered from the channel. They have been moved together to the same place in
the file for better readability but they were not changed at all.

The "buffer" memory pool was also renamed "channel".
2012-09-03 20:47:33 +02:00
Willy Tarreau
03cdb7c678 CLEANUP: channel: usr CF_/CHN_ prefixes instead of BF_/BUF_
Get rid of these confusing BF_* flags. Now channel naming should clearly
be used everywhere appropriate.

No code was changed, only a renaming was performed. The comments about
channel operations was updated.
2012-09-03 20:47:33 +02:00
Willy Tarreau
af81935b82 REORG: channel: move buffer_{replace,insert_line}* to buffer.{c,h}
These functions do not depend on the channel flags anymore thus they're
much better suited to be used on plain buffers. Move them from channel
to buffer.
2012-09-03 20:47:33 +02:00
Willy Tarreau
f941cf2ef2 MAJOR: channel: remove the BF_FULL flag
This is similar to the recent removal of BF_OUT_EMPTY. This flag was very
problematic because it relies on permanently changing information such as the
to_forward value, so it had to be updated upon every change to the buffers.
Previous patch already got rid of its users.

One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
input and output is more reliable than checking the flag's replacement.
2012-09-03 20:47:33 +02:00
Willy Tarreau
42d06661a2 MINOR: buffer: provide a new buffer_full() function
This one only focuses on the input part of the buffer and is dedicated
to analysers.
2012-09-03 20:47:33 +02:00
Willy Tarreau
ad1cc3df9c MINOR: channel: rename bi_full to channel_full as it checks the whole channel
Since the function takes care of the forward count and involves more than
buffer knowledge, rename it.
2012-09-03 20:47:32 +02:00
Willy Tarreau
a75bcef867 REORG: buffer: move buffer_flush, b_adv and b_rew to buffer.h
These one now operate over real buffers, not channels anymore.
2012-09-03 20:47:32 +02:00
Willy Tarreau
8e21bb9e52 MAJOR: channel: remove the BF_OUT_EMPTY flag
This flag was very problematic because it was composite in that both changes
to the pipe or to the buffer had to cause this flag to be updated, which is
not always simple (eg: there may not even be a channel attached to a buffer
at all).

There were not that many users of this flags, mostly setters. So the flag got
replaced with a macro which reports whether the channel is empty or not, by
checking both the pipe and the buffer.

One part of the change is sensible : the flag was also part of BF_MASK_STATIC,
which is used by process_session() to rescan all analysers in case the flag's
status changes. At first glance, none of the analysers seems to change its
mind base on this flag when it is subject to change, so it seems fine not to
add variation checks here. Otherwise it's possible that checking the buffer's
output size is more useful than checking the flag's replacement.
2012-09-03 20:47:32 +02:00
Willy Tarreau
c7e4238df0 REORG: buffers: split buffers into chunk,buffer,channel
Many parts of the channel definition still make use of the "buffer" word.
2012-09-03 20:47:32 +02:00
Willy Tarreau
c578891112 CLEANUP: connection: split sock_ops into data_ops, app_cp and si_ops
Some parts of the sock_ops structure were only used by the stream
interface and have been moved into si_ops. Some of them were callbacks
to the stream interface from the connection and have been moved into
app_cp as they're the application seen from the connection (later,
health-checks will need to use them). The rest has moved to data_ops.

Normally at this point the connection could live without knowing about
stream interfaces at all.
2012-09-03 20:47:31 +02:00
Willy Tarreau
96199b1016 MAJOR: stream-interface: restore splicing mechanism
The splicing is now provided by the data-layer rcv_pipe/snd_pipe functions
which in turn are called by the stream interface's recv and send callbacks.

The presence of the rcv_pipe/snd_pipe functions is used to attest support
for splicing at the data layer. It looks like the stream-interface's
SI_FL_CAP_SPLICE flag does not make sense anymore as it's used as a proxy
for the pointers above.

It also appears that we call chk_snd() from the recv callback and then
try to call it again in update_conn(). It is very likely that this last
function will progressively slip into the recv/send callbacks in order
to avoid duplicate check code.

The code works right now with and without splicing. Only raw_sock provides
support for it and it is automatically selected when the various splice
options are set. However it looks like splice-auto doesn't enable it, which
possibly means that the streamer detection code does not work anymore, or
that it's only called at a time where it's too late to enable splicing (in
process_session).
2012-09-03 20:47:31 +02:00
Willy Tarreau
5368d80ede MAJOR: connection: split the send call into connection and stream interface
Similar to what was done on the receive path, the data layer now provides
only an snd_buf() callback that is iterated over by the stream interface's
si_conn_send_loop() function.

The data layer now has no knowledge about channels nor stream interfaces.

The splice() code still need to be ported as it currently is disabled.
2012-09-03 20:47:31 +02:00
Willy Tarreau
ce323dea14 REORG: stream-interface: move sock_raw_read() to si_conn_recv_cb()
The recv function is now generic and is usable to iterate any connection-to-buf
reading function from a stream interface. So let's move it to stream-interface.
2012-09-03 20:47:30 +02:00
Willy Tarreau
1fe6bc335a MINOR: stream-interface: add an rcv_buf callback to sock_ops
This one is to be used by the read I/O handlers.
2012-09-03 20:47:30 +02:00
Willy Tarreau
2ba4465086 MAJOR: raw_sock: extract raw_sock_to_buf() from raw_sock_read()
This is the start of the stream connection iterator which calls the
data-layer reader. This still looks a bit tricky but is OK. Splicing
is not handled at all at the moment.
2012-09-03 20:47:30 +02:00
Willy Tarreau
75bf2c925f REORG: sock_raw: rename the files raw_sock*
The "raw_sock" prefix will be more convenient for naming functions as
it will be prefixed with the data layer and suffixed with the data
direction. So let's rename the files now to avoid any further confusion.

The #include directive was also removed from a number of files which do
not need it anymore.
2012-09-02 21:54:56 +02:00
Willy Tarreau
3af56a9359 MINOR: connection: provide conn_{data|sock}_{read0|shutw} functions
These functions are used to report unidirectional shutdown and to disable
polling in the related direction.
2012-09-02 21:54:56 +02:00
Willy Tarreau
572bf9095d REORG/MAJOR: extract "struct buffer" from "struct channel"
At the moment, the struct is still embedded into the struct channel, but
all the functions have been updated to use struct buffer only when possible,
otherwise struct channel. Some functions would likely need to be splitted
between a buffer-layer primitive and a channel-layer function.

Later the buffer should become a pointer in the struct buffer, but doing so
requires a few changes to the buffer allocation calls.
2012-09-02 21:54:56 +02:00
Willy Tarreau
7421efb85f REORG/MAJOR: use "struct channel" instead of "struct buffer"
This is a massive rename. We'll then split channel and buffer.

This change needs a lot of cleanups. At many locations, the parameter
or variable is still called "buf" which will become ambiguous. Also,
the "struct channel" is still defined in buffers.h.
2012-09-02 21:54:55 +02:00
Willy Tarreau
9bf9c14c12 MEDIUM: stream-interface: provide a generic stream_sock_read0() function
This function is used by the data layer when a zero has been read over a
connection. At the moment it only handles sockets and nothing else. Once
the complete split is done between buffers and stream interfaces, it should
become possible to work regardless on the connection type.
2012-09-02 21:54:55 +02:00
Willy Tarreau
eecf6ca68a MEDIUM: stream-interface: provide a generic si_conn_send_cb callback
The connection send() callback is supposed to be generic for a
stream-interface, and consists in calling the lower layer snd_buf
function. Move this function to the stream interface and remove
the sock-raw and sock-ssl clones.
2012-09-02 21:54:55 +02:00
Willy Tarreau
de5722c302 MEDIUM: stream-interface: provide a generic stream_int_chk_snd_conn() function
This one can be used by both sock_raw and sock_ssl instead of each having their own.
2012-09-02 21:54:55 +02:00