Commit Graph

1865 Commits

Author SHA1 Message Date
Willy Tarreau
9fbe18e174 MEDIUM: http: add a new option http-buffer-request
It is sometimes desirable to wait for the body of an HTTP request before
taking a decision. This is what is being done by "balance url_param" for
example. The first use case is to buffer requests from slow clients before
connecting to the server. Another use case consists in taking the routing
decision based on the request body's contents. This option placed in a
frontend or backend forces the HTTP processing to wait until either the whole
body is received, or the request buffer is full, or the first chunk is
complete in case of chunked encoding. It can have undesired side effects with
some applications abusing HTTP by expecting unbufferred transmissions between
the frontend and the backend, so this should definitely not be used by
default.

Note that it would not work for the response because we don't reset the
message state before starting to forward. For the response we need to
1) reset the message state to MSG_100_SENT or BODY , and 2) to reset
body_len in case of chunked encoding to avoid counting it twice.
2015-05-02 00:10:44 +02:00
Willy Tarreau
748179eb5a MEDIUM: stream: move HTTP request body analyser before process_common
Since 1.5, the request body analyser has become independant from any
other element and does not even disturb the message forwarder anymore.
And since it's disabled by default, we can place it before most
analysers so that it's can preempt any other one if an intermediary
one enables it.
2015-05-02 00:10:44 +02:00
Willy Tarreau
30fe818979 DOC: fix the comments about the meaning of msg->sol in HTTP
It has a meaning while parsing a body when using chunked encoding.
This must be backported to 1.5 since it caused a bug there as well.
2015-05-01 23:24:31 +02:00
Willy Tarreau
aa729784e1 MINOR: peers: store the pointer to the signal handler
We'll need it to unregister stopped peers sections.
2015-05-01 20:16:31 +02:00
Willy Tarreau
0f228a037a MEDIUM: http: add option-ignore-probes to get rid of the floods of 408
Recently some browsers started to implement a "pre-connect" feature
consisting in speculatively connecting to some recently visited web sites
just in case the user would like to visit them. This results in many
connections being established to web sites, which end up in 408 Request
Timeout if the timeout strikes first, or 400 Bad Request when the browser
decides to close them first. These ones pollute the log and feed the error
counters. There was already "option dontlognull" but it's insufficient in
this case. Instead, this option does the following things :
   - prevent any 400/408 message from being sent to the client if nothing
     was received over a connection before it was closed ;
   - prevent any log from being emitted in this situation ;
   - prevent any error counter from being incremented

That way the empty connection is silently ignored. Note that it is better
not to use this unless it is clear that it is needed, because it will hide
real problems. The most common reason for not receiving a request and seeing
a 408 is due to an MTU inconsistency between the client and an intermediary
element such as a VPN, which blocks too large packets. These issues are
generally seen with POST requests as well as GET with large cookies. The logs
are often the only way to detect them.

This patch should be backported to 1.5 since it avoids false alerts and
makes it easier to monitor haproxy's status.
2015-05-01 15:39:23 +02:00
Willy Tarreau
f3045d2a06 MAJOR: pattern: add LRU-based cache on pattern matching
The principle of this cache is to have a global cache for all pattern
matching operations which rely on lists (reg, sub, dir, dom, ...). The
input data, the expression and a random seed are used as a hashing key.
The cached entries contains a pointer to the expression and a revision
number for that expression so that we don't accidently used obsolete
data after a pattern update or a very unlikely hash collision.

Regarding the risk of collisions, 10k entries at 10k req/s mean 1% risk
of a collision after 60 years, that's already much less than the memory's
reliability in most machines and more durable than most admin's life
expectancy. A collision will result in a valid result to be returned
for a different entry from the same list. If this is not acceptable,
the cache can be disabled using tune.pattern.cache-size.

A test on a file containing 10k small regex showed that the regex
matching was limited to 6k/s instead of 70k with regular strings.
When enabling the LRU cache, the performance was back to 70k/s.
2015-04-29 19:15:24 +02:00
Willy Tarreau
72f073b6c7 MEDIUM: pattern: add a revision to all pattern expressions
This will be used to detect any change on the pattern list between
two operations, ultimately making it possible to implement a cache
which immediately invalidates obsolete keys after an update. The
revision is simply taken from the timestamp counter to ensure that
even upon a pointer reuse we cannot accidently come back to the
same (expr,revision) tuple.
2015-04-29 19:15:24 +02:00
Willy Tarreau
b5684e0081 IMPORT: hash: import xxhash-r39
The xxhash library provides a very fast and excellent hash algorithm
suitable for many purposes. It excels at hashing large blocks but is
also extremely fast on small ones. It's distributed under a 2-clause
BSD license (GPL-compatible) so it can be included here. Updates are
distributed here :

      https://github.com/Cyan4973/xxHash
2015-04-29 19:15:21 +02:00
Willy Tarreau
69c696c138 IMPORT: lru: import simple ebtree-based LRU functions
This will be usable to implement some maps/acl caches for heavy datasets
loaded from files (mostly regex-based but in general anything that cannot
be indexed in a tree).
2015-04-29 19:14:43 +02:00
Willy Tarreau
e6e49cfa93 MINOR: tools: provide an rdtsc() function for time comparisons
This one returns a timestamp, either the one from the CPU or from
gettimeofday() in 64-bit format. The purpose is to be able to compare
timestamps on various entities to make it easier to detect updates.
It can also be used for benchmarking in certain situations during
development.
2015-04-29 19:14:03 +02:00
Andrew Hayworth
0ebc55f6b4 MEDIUM: logs: Add HTTP request-line log format directives
This commit adds 4 new log format variables that parse the
HTTP Request-Line for more specific logging than "%r" provides.

For example, we can parse the following HTTP Request-Line with
these new variables:

  "GET /foo?bar=baz HTTP/1.1"

- %HM: HTTP Method ("GET")
- %HV: HTTP Version ("HTTP/1.1")
- %HU: HTTP Request-URI ("/foo?bar=baz")
- %HP: HTTP Request-URI without query string ("/foo")
2015-04-28 21:03:05 +02:00
Willy Tarreau
e5843b383d BUG/MEDIUM: peers: recent applet changes broke peers updates scheduling
Since appctx are scheduled out of streams, it's pointless to wake up
the task managing the stream to push updates, they won't be seen. In
fact unit tests work because silent sessions are restarted after 5s of
idle and the exchange is correctly scheduled during startup!

So we need to notify the appctx instead. For this we add a pointer to
the appctx in the peer session.

No backport is needed of course.
2015-04-27 18:42:17 +02:00
Willy Tarreau
eb406dc73c MINOR: stream-int: add two flags to indicate an applet's wishes regarding I/O
Currently we have a problem. There are some cases where a sleeping applet
is not woken up (eg: show sess during an injection). The reason is that
the applet is marked WAIT_DATA and is not woken up when WAIT_ROOM leaves,
because we wait for both flags to be cleared in order to call it.

And if we wait for either flag, then we have the opposite situation, which
is that we're not waiting for room in the output buffer so we're spinning
calling the applet to do nothing.

What is missing is an indication of what the applet needs. Since it only
manipulates the WAIT_ROOM/WAIT_DATA which are overwritten later, that cannot
work. In the case of connections, the problem doesn't happen because the
connection maintains these extra states. Ideally we'd need to have similar
states for each appctx and to store those information there. But it would
be overcomplicated given that an applet doesn't exist alone without a
stream-int, so we can safely put these information into the stream int and
make the code simpler.

With this patch we introduce two new flags in the stream interface :
  - SI_FL_WANT_PUT : the applet wants to put something into the buffer
  - SI_FL_WANT_GET : the applet wants to get something from the buffer

We also have the new functions si_applet_{stop|want|cant}_{get|put}
to make the code look similar to the connection code.

For now these flags are not used yet.
2015-04-23 17:56:17 +02:00
Willy Tarreau
e5f8649102 MEDIUM: stream-int: add a new function si_applet_done()
This is the equivalent of si_conn_wake() but for applets. It will be
called after changes to the stream interface are brought by the applet
I/O handler. Ultimately it will release buffers and may be even wake
the stream's task up if some important changes are detected.

It would be nice to be able to merge it with the connection's wake
function since it mostly manipulates the stream interface, but there
are minor differences (such as how to enable/disable polling on a fd
vs applet) and some specificities to applets (eg: don't wake the
applet up until the output is empty) which would require abstract
functions which would slow down everything.
2015-04-23 17:56:16 +02:00
Willy Tarreau
3c595ac3ad MEDIUM: applet: implement a run queue for active appctx
The new function is called for each round of polling in order to call any
active appctx. For now we pick the stream interface from the appctx's
owner. At the moment there's no appctx queued yet, but we have everything
needed to queue them and remove them.
2015-04-23 17:56:16 +02:00
Willy Tarreau
81f38d6f57 MEDIUM: applet: add basic support for an applet run queue
This will be needed so that we can schedule applets out of the streams.
For now nothing calls the queue yet.
2015-04-23 17:56:16 +02:00
Willy Tarreau
d45b9f8991 REORG: stream-int: create si_applet_ops dedicated to applets
These functions are dedicated to applets so that we don't use the default
ones anymore in this case.
2015-04-23 17:56:16 +02:00
Willy Tarreau
3057645b37 CLEANUP: applet: rename struct si_applet to applet
Since this one does not depend on stream_interface anymore, remove the
"si_" prefix.
2015-04-23 17:56:16 +02:00
Willy Tarreau
8a8d83b85c REORG: applet: move the applet definitions out of stream_interface
We're tidying the definitions so that appctx lives on its own. A new
set of applet.h files has been added for this purpose.
2015-04-23 17:56:16 +02:00
Willy Tarreau
00a37f0029 MEDIUM: applet: make the applet not depend on a stream interface anymore
Now that applet's functions only take an appctx in argument, not a
stream interface. This slightly simplifies the code and will be needed
to take the appctx out of the stream interface.
2015-04-23 17:56:16 +02:00
Willy Tarreau
19c8161b3d MINOR: applet: add a new "owner" pointer in the appctx
This pointer indicates what stream-interface the appctx belongs to, just
like we have for the connections.
2015-04-23 17:56:16 +02:00
Willy Tarreau
7365dad40f BUG/MEDIUM: stream-int: always reset si->ops when si->end is nullified
It happened after changing the stream interface deinitialization
sequence that we got random crashes with si_shutw() being called
on NULL si->end. The reason was that si->ops was not reset after
a call to si_release_endpoint() which is sometimes called directly.

Thus we now move the resetting of si->ops just after any si->end
assignment. It happens that si_detach() is now just the same as
si_release_endpoint() and stream_int_unregister_handler(). Some
cleanup will have to be performed there.

It's not sure whether this problem can impact 1.5 since in 1.5
applets are part of the default embedded stream handler. The only
way it could cause some trouble is if it's used with a connection,
which doesn't seem possible at first glance.
2015-04-21 14:15:22 +02:00
Willy Tarreau
152b81e7b2 BUG/MAJOR: tcp/http: fix current_rule assignment when restarting over a ruleset
Commit bc4c1ac ("MEDIUM: http/tcp: permit to resume http and tcp custom
actions") introduced the ability to interrupt and restart processing in
the middle of a TCP/HTTP ruleset. But it doesn't do it in a consistent
way : it checks current_rule_list, immediately dereferences current_rule,
which is only set in certain cases and never cleared. So that broke the
tcp-request content rules when the processing was interrupted due to
missing data, because current_rule was not yet set (segfault) or could
have been inherited from another ruleset if it was used in a backend
(random behaviour).

The proper way to do it is to always set current_rule before dereferencing
it. But we don't want to set it for all rules because we don't want any
action to provide a checkpointing mechanism. So current_rule is set to NULL
before entering the loop, and only used if not NULL and if current_rule_list
matches the current list. This way they both serve as a guard for the other
one. This fix also makes the current rule point to the rule instead of its
list element, as it's much easier to manipulate.

No backport is needed, this is 1.6-specific.
2015-04-20 13:46:20 +02:00
CJ Ess
108b1dd69d MEDIUM: http: configurable http result codes for http-request deny
This patch adds support for error codes 429 and 405 to Haproxy and a
"deny_status XXX" option to "http-request deny" where you can specify which
code is returned with 403 being the default. We really want to do this the
"haproxy way" and hope to have this patch included in the mainline. We'll
be happy address any feedback on how this is implemented.
2015-04-11 10:34:54 +02:00
Willy Tarreau
73b65acd46 MINOR: stream: pass the pointer to the origin explicitly to stream_new()
We don't pass sess->origin anymore but the pointer to the previous step. Now
it should be much easier to chain elements together once applets are moved out
of streams. Indeed, the session is only used for configuration and not for the
dynamic chaining anymore.
2015-04-08 18:26:29 +02:00
Willy Tarreau
1b90511eeb CLEANUP: namespaces: fix protection against multiple inclusions
The include file did not protect correctly against multiple inclusions,
as it didn't define the file name after checking for it. That's currently
harmless as the file is only included from .c but that could change.
2015-04-08 17:31:40 +02:00
Thierry FOURNIER
3def393f8d MINOR: lua: map system integration in Lua
This patch cretes a new Map class that permits to do some lookup in
HAProxy maps. This Map class is integration in the HAProxy update
system, so we can modify the map throught the socket.
2015-04-07 15:56:21 +02:00
Willy Tarreau
d414f8e8b7 CLEANUP: stream-int: swap stream-int and appctx declarations
This is just in order to remove two forward declarations of si_applet
and stream_interface that are not needed once properly ordered.
2015-04-06 11:43:45 +02:00
Willy Tarreau
02d863866d MEDIUM: stream: return the stream upon accept()
The function was called stream_accept_session(), let's rename it
stream_new() and make it return the newly allocated pointer. It's
more convenient for some callers who need it.
2015-04-06 11:37:34 +02:00
Willy Tarreau
c38f71cfcd MINOR: session: introduce session_new()
This one creates a new session and does the minimum initialization.
2015-04-06 11:37:33 +02:00
Willy Tarreau
a7513f5d00 MINOR: stream-int: make appctx_new() take the applet in argument
Doing so simplifies the initialization of a new appctx. We don't
need appctx_set_applet() anymore.
2015-04-06 11:37:32 +02:00
Willy Tarreau
9903f0e1a2 REORG: session: move the session parts out of stream.c
This concerns everythins related to accepting a new session and
expiring the embryonic session. There's still a hard-coded call
to stream_accept_session() which could be set somewhere in the
frontend, but for now it's not a problem.
2015-04-06 11:37:32 +02:00
Willy Tarreau
32990b531b MEDIUM: session: remove the task pointer from the session
Now that the previous changes were made, we can add a struct task
pointer to stream_complete() and get rid of it in struct session.

The new relation between connection, session and task are like this :

          orig -- sess <-- context
           |                   |
           v                   |
          conn -- owner ---> task

Some session-specific parts should now move away from stream.
2015-04-06 11:37:32 +02:00
Willy Tarreau
02a0c0e407 MAJOR: stream: don't initialize the stream anymore in stream_accept
The function now only initializes a session, calls the tcp req connection
rules, and calls stream_complete() to finish initialization. If a handshake
is needed, it is done without allocating the stream at all.

Temporarily, in order to limit the amount of changes, the task allocated
is put into sess->task, and it is used by the connection for the handshake
or is offered to the stream. At this point we set the relation between
sess/task/conn this way :

        orig -- sess  <-- context
         |       ^ +- task -+  |
         v       |          v  |
        conn -- owner       task

The task must not remain in the session and ultimately it is planned to
remove this task pointer from the session because it can be found by
having conn->owner = task, and looping back from sess to conn, and to
find the session from the connection via the task.
2015-04-06 11:37:32 +02:00
Willy Tarreau
e73ef85a63 MAJOR: tcp: make tcp_exec_req_rules() only rely on the session
It passes a NULL wherever a stream was needed (acl_exec_cond() and
action_ptr mainly). It can still track the connection rate correctly
and block based on ACLs.
2015-04-06 11:37:31 +02:00
Willy Tarreau
bb2ef12a60 MEDIUM: session: update the session's stick counters upon session_free()
Whenever session_free() is called, any possible stick counter stored in
the session will be synchronized.
2015-04-06 11:37:31 +02:00
Willy Tarreau
8b7f8688ee MEDIUM: streams: support looking up stkctr in the session
In order to support sessions tracking counters, we first ensure that there
is no overlap between streams' stkctr and sessions', and we allow an
automatic lookup into the session's counters when the stream doesn't
have a counter or when the stream doesn't exist during an access via
a sample fetch. The functions used to update the stream counters only
update them and not the session counters however.
2015-04-06 11:37:31 +02:00
Willy Tarreau
7698c9080a REORG: stktable: move the stkctr_* functions from stream to sticktable
These ones are not stream-specific at all and will be needed outside of
stream, so let's move them to stick_tables where struct stkctr is defined.
2015-04-06 11:37:30 +02:00
Willy Tarreau
b2bf8331fb MINOR: session: add stick counters to the struct session
The stick counters in the session will be used for everything not related
to contents, hence the connections / concurrent sessions / etc. They will
be usable by "tcp-request connection" rules even without a stream. For now
they're just allocated and initialized.
2015-04-06 11:37:30 +02:00
Willy Tarreau
11c3624c32 MINOR: session: implement session_free() and use it everywhere
We want to call this one everywhere we have to kill a session so
that future parts we move to the session can be released from there.
2015-04-06 11:37:30 +02:00
Willy Tarreau
7ea671b914 MINOR: session: store the session's accept date
Doing so ensures we don't need to use the stream anymore to prepare the
log information to report a failed handshake on an embryonic session.
Thus, prepare_mini_sess_log_prefix() now takes a session in argument.
2015-04-06 11:37:30 +02:00
Willy Tarreau
1f52bb27ec CLEANUP: stream: don't set ->target to the incoming connection anymore
Now that we have sess->origin to carry that information along, we don't
need to put that into strm->target anymore, so we remove one dependence
on the stream in embryonic connections.
2015-04-06 11:37:29 +02:00
Willy Tarreau
d0d8da989b MINOR: stream: provide a few helpers to retrieve frontend, listener and origin
Expressions are quite long when using strm_sess(strm)->whatever, so let's
provide a few helpers : strm_fe(), strm_li(), strm_orig().
2015-04-06 11:37:29 +02:00
Willy Tarreau
192252e2d8 MAJOR: sample: pass a pointer to the session to each sample fetch function
Many such function need a session, and till now they used to dereference
the stream. Once we remove the stream from the embryonic session, this
will not be possible anymore.

So as of now, sample fetch functions will be called with this :

   - sess = NULL,  strm = NULL                     : never
   - sess = valid, strm = NULL                     : tcp-req connection
   - sess = valid, strm = valid, strm->txn = NULL  : tcp-req content
   - sess = valid, strm = valid, strm->txn = valid : http-req / http-res
2015-04-06 11:37:25 +02:00
Willy Tarreau
987e3fb868 MEDIUM: http: remove the now useless http_txn from {req/res} rules
The registerable http_req_rules / http_res_rules used to require a
struct http_txn at the end. It's redundant with struct stream and
propagates very deep into some parts (ie: it was the reason for lua
requiring l7). Let's remove it now.
2015-04-06 11:35:53 +02:00
Willy Tarreau
f3bf3050a1 CLEANUP: lua: remove unused hlua_smp->l7 and hlua_txn->l7
Since last commit, we don't retrieve the HTTP transaction from there
anymore, so these entries can go.
2015-04-06 11:35:53 +02:00
Willy Tarreau
15e91e1b36 MAJOR: sample: don't pass l7 anymore to sample fetch functions
All of them can now retrieve the HTTP transaction *if it exists* from
the stream and be sure to get NULL there when called with an embryonic
session.

The patch is a bit large because many locations were touched (all fetch
functions had to have their prototype adjusted). The opportunity was
taken to also uniformize the call names (the stream is now always "strm"
instead of "l4") and to fix indent where it was broken. This way when
we later introduce the session here there will be less confusion.
2015-04-06 11:35:53 +02:00
Willy Tarreau
eee5b51248 MAJOR: http: move http_txn out of struct stream
Now this one is dynamically allocated. It means that 280 bytes of memory
are saved per TCP stream, but more importantly that it will become
possible to remove the l7 pointer from fetches and converters since
it will be deduced from the stream and will support being null.

A lot of care was taken because it's easy to forget a test somewhere,
and the previous code used to always trust s->txn for being valid, but
all places seem to have been visited.

All HTTP fetch functions check the txn first so we shouldn't have any
issue there even when called from TCP. When branching from a TCP frontend
to an HTTP backend, the txn is properly allocated at the same time as the
hdr_idx.
2015-04-06 11:35:52 +02:00
Willy Tarreau
cb7dd015be MEDIUM: http: move header captures from http_txn to struct stream
The header captures are now general purpose captures since tcp rules
can use them to capture various contents. That removes a dependency
on http_txn that appeared in some sample fetch functions and in the
order by which captures and http_txn were allocated.

Interestingly the reset of the header captures were done at too many
places as http_init_txn() used to do it while it was done previously
in every call place.
2015-04-06 11:35:52 +02:00
Willy Tarreau
40606ab976 MINOR: session: add a pointer to the session's origin
A session's origin is the entity that was responsible for creating
the session. It can be an applet or a connection for now.
2015-04-06 11:23:58 +02:00
Willy Tarreau
e36cbcb3b0 MEDIUM: stream: move the frontend's pointer to the session
Just like for the listener, the frontend is session-wide so let's move
it to the session. There are a lot of places which were changed but the
changes are minimal in fact.
2015-04-06 11:23:58 +02:00
Willy Tarreau
fb0afa77c9 MEDIUM: stream: move the listener's pointer to the session
The listener is session-specific, move it there.
2015-04-06 11:23:57 +02:00
Willy Tarreau
b1ec8c4a59 MINOR: session: start to reintroduce struct session
There is now a pointer to the session in the stream, which is NULL
for now. The session pool is created as well. Some parts will move
from the stream to the session now.
2015-04-06 11:23:57 +02:00
Willy Tarreau
e7dff02dd4 REORG/MEDIUM: stream: rename stream flags from SN_* to SF_*
This is in order to keep things consistent.
2015-04-06 11:23:57 +02:00
Willy Tarreau
87b09668be REORG/MAJOR: session: rename the "session" entity to "stream"
With HTTP/2, we'll have to support multiplexed streams. A stream is in
fact the largest part of what we currently call a session, it has buffers,
logs, etc.

In order to catch any error, this commit removes any reference to the
struct session and tries to rename most "session" occurrences in function
names to "stream" and "sess" to "strm" when that's related to a session.

The files stream.{c,h} were added and session.{c,h} removed.

The session will be reintroduced later and a few parts of the stream
will progressively be moved overthere. It will more or less contain
only what we need in an embryonic session.

Sample fetch functions and converters will have to change a bit so
that they'll use an L5 (session) instead of what's currently called
"L4" which is in fact L6 for now.

Once all changes are completed, we should see approximately this :

   L7 - http_txn
   L6 - stream
   L5 - session
   L4 - connection | applet

There will be at most one http_txn per stream, and a same session will
possibly be referenced by multiple streams. A connection will point to
a session and to a stream. The session will hold all the information
we need to keep even when we don't yet have a stream.

Some more cleanup is needed because some code was already far from
being clean. The server queue management still refers to sessions at
many places while comments talk about connections. This will have to
be cleaned up once we have a server-side connection pool manager.
Stream flags "SN_*" still need to be renamed, it doesn't seem like
any of them will need to move to the session.
2015-04-06 11:23:56 +02:00
Willy Tarreau
418b8c0c41 MAJOR: compression: integrate support for libslz
This library is designed to emit a zlib-compatible stream with no
memory usage and to favor resource savings over compression ratio.
While zlib requires 256 kB of RAM per compression context (and can only
support 4000 connections per GB of RAM), the stateless compression
offered by libslz does not need to retain buffers between subsequent
calls. In theory this slightly reduces the compression ratio but in
practice it does not have that much of an effect since the zlib
window is limited to 32kB.

Libslz is available at :

      http://git.1wt.eu/web?p=libslz.git

It was designed for web compression and provides a lot of savings
over zlib in haproxy. Here are the preliminary results on a single
core of a core2-quad 3.0 GHz in 32-bit for only 300 concurrent
sessions visiting the home page of www.haproxy.org (76 kB) with
the default 16kB buffers :

          BW In      BW Out     BW Saved   Ratio   memory VSZ/RSS
zlib      237 Mbps    92 Mbps   145 Mbps   2.58     84M /  69M
slz       733 Mbps   380 Mbps   353 Mbps   1.93    5.9M / 4.2M

So while the compression ratio is lower, the bandwidth savings are
much more important due to the significantly lower compression cost
which allows to consume even more data from the servers. In the
example above, zlib became the bottleneck at 24% of the output
bandwidth. Also the difference in memory usage is obvious.

More tests run on a single core of a core i5-3320M, with 500 concurrent
users and the default 16kB buffers :

At 100% CPU (no limit) :
          BW In      BW Out     BW Saved   Ratio   memory VSZ/RSS  hits/s
zlib      480 Mbps   188 Mbps   292 Mbps   2.55     130M / 101M     744
slz      1700 Mbps   810 Mbps   890 Mbps   2.10    23.7M / 9.7M    2382

At 85% CPU (limited) :
          BW In      BW Out     BW Saved   Ratio   memory VSZ/RSS  hits/s
zlib     1240 Mbps   976 Mbps   264 Mbps   1.27     130M / 100M    1738
slz      1600 Mbps   976 Mbps   624 Mbps   1.64    23.7M / 9.7M    2210

The most important benefit really happens when the CPU usage is
limited by "maxcompcpuusage" or the BW limited by "maxcomprate" :
in order to preserve resources, haproxy throttles the compression
ratio until usage is within limits. Since slz is much cheaper, the
average compression ratio is much higher and the input bandwidth
is quite higher for one Gbps output.

Other tests made with some reference files :

                           BW In     BW Out    BW Saved  Ratio  hits/s
daniels.html       zlib  1320 Mbps  163 Mbps  1157 Mbps   8.10    1925
                   slz   3600 Mbps  580 Mbps  3020 Mbps   6.20    5300

tv.com/listing     zlib   980 Mbps  124 Mbps   856 Mbps   7.90     310
                   slz   3300 Mbps  553 Mbps  2747 Mbps   5.97    1100

jquery.min.js      zlib   430 Mbps  180 Mbps   250 Mbps   2.39     547
                   slz   1470 Mbps  764 Mbps   706 Mbps   1.92    1815

bootstrap.min.css  zlib   790 Mbps  165 Mbps   625 Mbps   4.79     777
                   slz   2450 Mbps  650 Mbps  1800 Mbps   3.77    2400

So on top of saving a lot of memory, slz is constantly 2.5-3.5 times
faster than zlib and results in providing more savings for a fixed CPU
usage. For links smaller than 100 Mbps, zlib still provides a better
compression ratio, at the expense of a much higher CPU usage.

Larger input files provide slightly higher bandwidth for both libs, at
the expense of a bit more memory usage for zlib (it converges to 256kB
per connection).
2015-03-29 03:32:06 +02:00
Willy Tarreau
7b21877888 CLEANUP: compression: remove unused reset functions
It's unclear what purpose these functions used to server, however they
are not used anywhere, one good reason to remove them.
2015-03-28 22:08:25 +01:00
Willy Tarreau
9787efa97c MEDIUM: compression: split deflate_flush() into flush and finish
This function used to take a zlib-specific flag as argument to indicate
whether a buffer flush or end of contents was met, let's split it in two
so that we don't depend on zlib anymore.
2015-03-28 19:17:31 +01:00
Willy Tarreau
615105e7e8 MEDIUM: compression: add a distinction between UA- and config- algorithms
Thanks to MSIE/IIS, the "deflate" name is ambigous. According to the RFC
it's a zlib-wrapped deflate stream, but IIS used to send only a raw deflate
stream, which is the only format MSIE understands for "deflate". The other
widely used browsers do support both formats. For this reason some people
prefer to emit a raw deflate stream on "deflate" to serve more users even
it that means violating the standards. Haproxy only follows the standard,
so they cannot do this.

This patch makes it possible to have one algorithm name in the configuration
and another one in the protocol. This will make it possible to have a new
configuration token to add a different algorithm so that users can decide if
they want a raw deflate or the standard one.
2015-03-28 16:46:38 +01:00
Willy Tarreau
9f640a1eab CLEANUP: compression: statify all algo-specific functions
There's no reason for exporting identity_* nor deflate_*, they're only
used in the same file. Mark them static, it will make it easier to add
other algorithms.
2015-03-28 15:46:00 +01:00
Willy Tarreau
15530d28a4 MEDIUM: compression: don't send leading zeroes with chunk size
Till now we used to rely on a fixed maximum chunk size. Thanks to last
commit we're now free to adjust the chunk's length before sending the
data, so we don't have to use 6 digits all the time anymore, and if
one wants buffers larger than 16 MB it is now possible.
2015-03-28 12:05:47 +01:00
Thierry FOURNIER
08504f4e19 MINOR: lua: create and register HTTP class
This class is accessible via the TXN object. It is created only if
the attached proxy have HTTP mode. It contain all the HTTP
manipulation functions:

 - req_get_headers
 - req_del_header
 - req_rep_header
 - req_rep_value
 - req_add_header
 - req_set_header
 - req_set_method
 - req_set_path
 - req_set_query
 - req_set_uri

 - res_get_headers
 - res_del_header
 - res_rep_header
 - res_rep_value
 - res_add_header
 - res_set_header
2015-03-18 11:34:06 +01:00
Thierry FOURNIER
7fe75e0dab MINOR: http: export function inet_set_tos()
This is used by Lua.
2015-03-18 11:34:06 +01:00
Thierry FOURNIER
5531f87ace MINOR: http: split http_transform_header() function in two parts.
This function is a callback for HTTP actions. This function
creates the replacement string from a build_logline() format
and transform the header.

This patch split this function in two part. With this modification,
the header transformation and the replacement string are separed.

We can now transform the header with another replacement string
source than a build_logline() format.
2015-03-18 11:34:06 +01:00
Thierry FOURNIER
b77aece24a MINOR: http: split the function http_action_set_req_line() in two parts
The first part is the replacement engine. It take a replacement action
number and a replacement string and process the action.

The second part is the function which is called by the 'http-request
action' to replace a request line part. This function makes the
string used as replacement.

This split permits to use the replacement engine in other parts of the
code than the request action. The Lua use it for his own http action.
2015-03-18 11:34:06 +01:00
Willy Tarreau
10b688f2b4 MEDIUM: listener: store the default target per listener
This will be useful later to state that some listeners have to use
certain decoders (typically an HTTP/2 decoder) regardless of the
regular processing applied to other listeners. For now it simply
defaults to the frontend's default target, and it is used by the
session.
2015-03-13 16:45:37 +01:00
Willy Tarreau
512fd00296 CLEANUP: listeners: remove unused timeout
Listerner->timeout is a vestigal thing going back to 2007 or so. It
used to only be used by stats and peers frontends to hold a pointer
to the proxy's client timeout. Now that we use regular frontends, we
don't use it anymore.
2015-03-13 16:25:15 +01:00
Willy Tarreau
f87ab94e3b MINOR: proxy: store the default target into the frontend's configuration
Some services such as peers and CLI pre-set the target applet immediately
during accept(), and for this reason they're forced to have a dedicated
accept() function which does not even properly follow everything the regular
one does (eg: sndbuf/rcvbuf/linger/nodelay are not set, etc).

Let's store the default target when known into the frontend's config so that
it's session_accept() which automatically sets it.
2015-03-13 16:23:00 +01:00
Willy Tarreau
91d9628a51 MINOR: peers: centralize configuration of the peers frontend
This is in order to stop exporting the peer_accept() function.
2015-03-13 16:23:00 +01:00
Thierry FOURNIER
933e5deb2b MEDIUM: map: uses HAProxy facilities to store default value
The default value is stored in a special struct that describe the
map. This default value is parsed with special parser. This is
useless because HAProxy provides a space to store the default
value (the args) and HAProxy provides also standard parser for
the input types (args again).

This patch remove this special storage and replace it by an argument.
In other way the args of maps are declared as the expected type in
place of strings.
2015-03-13 14:10:30 +01:00
Willy Tarreau
d85c48589a REORG: connection: move conn_drain() to connection.c and rename it
It's now called conn_sock_drain() to make it clear that it only reads
at the sock layer and not at the data layer. The function was too big
to remain inlined and it's used at a few places where size counts.
2015-03-13 00:42:48 +01:00
Willy Tarreau
f31fb07958 MEDIUM: connection: make conn_drain() perform more controls
Currently si_idle_conn_null_cb() has to perform some low-level checks
over the file descriptor and the connection configuration that should
only belong to conn_drain(). Let's move these controls there. The
function now automatically checks for errors and hangups on the file
descriptor for example, and disables recv polling if there's no drain
function at the control layer.
2015-03-13 00:32:20 +01:00
Willy Tarreau
ff3e648812 MINOR: connection: implement conn_sock_send()
This function is an equivalent to send() which operates over a connection
instead of a file descriptor. It checks that the control layer is ready
and that it's allowed to send. If automatically enables polling if it
cannot send. It simplifies the return checks by returning zero in all
cases where it cannot send so that the caller only has to care about
negative values indicating errors.
2015-03-13 00:04:49 +01:00
Willy Tarreau
729c69f6e5 MINOR: connection: perform the call to xprt->shutw() in conn_data_shutw()
This will save callers from having to care about conn->xprt and xprt->shutw.
Note that shutw() takes a second argument indicating whether it's a clean or
a hard shutw. This is used by SSL which tries to close cleanly in most cases.

Here we provide two versions, conn_data_shutw() which performs the clean
close, and conn_data_shutw_hard() which does the unclean one.
2015-03-12 22:51:10 +01:00
Willy Tarreau
a02e8c9510 MINOR: connection: make conn_sock_shutw() actually perform the shutdown() call
This function was not used yet and was only supposed to mark the connection
as shutdown for write. Unfortunately at other places in stream_interface.c,
we're seeing a bit of layering violations with attempts to perform the shutdown
on the fd directly. Let's make this function call shutdown() itself so that
the callers only have to care about the connection.
2015-03-12 22:42:29 +01:00
Willy Tarreau
2ec2274f90 MEDIUM: lua: remove hlua_sample_fetch
This struct used to carry only a sample fetch function. Thanks to
lua_pushuserdata(), we don't need to have the Lua engine allocate
that struct for us and we can simply push our pointer onto the stack.
This makes the code more readable by removing several occurrences of
"f->f->". Just like the previous patch, it comes with the nice effect
of saving about 1.3% of performance when fetching samples from Lua.
2015-03-11 20:41:47 +01:00
Willy Tarreau
47860ed505 MEDIUM: lua: remove struct hlua_channel
Since last cleanups, this one was only used to carry a struct channel.
Removing it makes the code a bit cleaner (no more chn->chn) and easier
to follow (no more abstraction for a common type). Interestingly it
happens to also make the Lua code slightly faster (about 1.5%) when
using channels, probably thanks to less pointer dereferences and maybe
the use of lua_pushlightuserdata().
2015-03-11 20:41:47 +01:00
Willy Tarreau
f6870521e5 CLEANUP: lua: remove the session pointer from hlua_channel
It was only used to find what side the channel belongs to.
2015-03-11 20:41:47 +01:00
Willy Tarreau
78955f4c8b MEDIUM: session: simplify receive buffer allocator to only use the channel
Now that we can get the session from the channel, let's simplify the
prototype of session_alloc_recv_buffer() to only require the channel.
Both the caller and the function are now simplified.
2015-03-11 20:41:47 +01:00
Willy Tarreau
d5ccfa3fc5 MINOR: channel: add chn_sess() helper to retrieve session from channel
Channels already have to know what session they below to. Add a helper
to retrieve this pointer so that we'll use it later.
2015-03-11 20:41:47 +01:00
Willy Tarreau
81cd90069a MEDIUM: channel: remove now unused ->prod and ->cons pointers
Nothing uses them anymore.
2015-03-11 20:41:47 +01:00
Willy Tarreau
5decc05a9e MAJOR: channel: only rely on the new CF_ISRESP flag to find the SI
Now we exclusively use this flag to find what side a channel is and
where the stream ints are. The ->prod and ->cons are not used anymore.
2015-03-11 20:41:47 +01:00
Willy Tarreau
ef573c0f22 MEDIUM: channel: add a new flag "CF_ISRESP" for the response channel
This flag designates the response channel. This will be used to know
what channel we're seeing and finding our way back to the session.
2015-03-11 20:41:47 +01:00
Willy Tarreau
73796535a9 REORG/MEDIUM: channel: only use chn_prod / chn_cons to find stream-interfaces
The purpose of these two macros will be to pass via the session to
find the relevant stream interfaces so that we don't need to store
the ->cons nor ->prod pointers anymore. Currently they're only defined
so that all references could be removed.

Note that many places need a second pass of clean up so that we don't
have any chn_prod(&s->req) anymore and only &s->si[0] instead, and
conversely for the 3 other cases.
2015-03-11 20:41:47 +01:00
Willy Tarreau
50fe03be78 CLEANUP: stream-int: add si_opposite() to find the other stream interface
At a few places we need to find one stream interface from the other one.
Instead of passing via the channel, we simply use the session as an
intermediary, which simply results in applying an offset to the pointer.
2015-03-11 20:41:47 +01:00
Willy Tarreau
4e4292b9af CLEANUP: stream-int: add si_ib/si_ob to dereference the buffers
This makes the code cleaner and is more intuitive to use.
2015-03-11 20:41:46 +01:00
Willy Tarreau
819d332dfd MEDIUM: stream-int: remove any reference to the owner
si->owner is not used anymore now, so let's remove any reference to it.
2015-03-11 20:41:46 +01:00
Willy Tarreau
07373b8660 MEDIUM: stream-int: use si_task() to retrieve the task from the stream int
We go back to the session to get the owner. Here again it's very easy
and is just a matter of relative offsets. Since the owner always exists
and always points to the session's task, we can remove some unneeded
tests.
2015-03-11 20:41:46 +01:00
Willy Tarreau
aefd79004c MEDIUM: stream-int: make si_sess() use the stream int's side
This one relies on the SI's side to find the pointer to the session.
That the stream interface doesn't have to look at the task's context
anymore.
2015-03-11 20:41:46 +01:00
Willy Tarreau
a2df3fa251 MEDIUM: stream-interface: remove now unused pointers to channels
Everyone must now use si_ic() / si_oc() to find the relevant channels,
the points have been totally removed.
2015-03-11 20:41:46 +01:00
Willy Tarreau
0b2fb7f9a3 MAJOR: stream-int: only rely on SI_FL_ISBACK to find the requested channel
In order to plan removal of si->ib / si->ob, we now check the side of the
stream interface and find the session, then the requested channel. In
practice it's just an offset applied to the pointer based on the flag.
2015-03-11 20:41:46 +01:00
Willy Tarreau
a5f5d8dc69 MEDIUM: stream-int: add a flag indicating which side the SI is on
This new flag "SI_FL_ISBACK" is set only on the back SI and is cleared
on the front SI. That way it's possible only by looking at the SI to
know what side it is.
2015-03-11 20:41:46 +01:00
Willy Tarreau
2bb4a96f8f REORG/MEDIUM: stream-int: introduce si_ic/si_oc to access channels
We'll soon remove direct references to the channels from the stream
interface since everything belongs to the same session, so let's
first not dereference si->ib / si->ob anymore and use macros instead.
2015-03-11 20:41:46 +01:00
Willy Tarreau
a27dc19eda CLEANUP: remove now unused channel pool
The channels are now part of the struct session. Their pool is
not needed anymore.
2015-03-11 20:41:46 +01:00
Willy Tarreau
22ec1eadd0 REORG/MAJOR: move session's req and resp channels back into the session
The channels were pointers to outside structs and this is not needed
anymore since the buffers have moved, but this complicates operations.
Move them back into the session so that both channels and stream interfaces
are always allocated for a session. Some places (some early sample fetch
functions) used to validate that a channel was NULL prior to dereferencing
it. Now instead we check if chn->buf is NULL and we force it to remain NULL
until the channel is initialized.
2015-03-11 20:41:46 +01:00
Thierry FOURNIER
2694a1a3c8 MINOR: lua: fetches and converters can return an empty string in place of nil
In some cases we don't want to known if a fetch or converter
fails. We just want a valid string. After this patch, we
have two sets of fetches and two sets of converters. There are:
txn.f, txn.sf, txn.c, txn.sc. The version prefixed by 's' always
returns strings for any type, and returns an empty string in the
error case or when the data are not available. This is particularly
useful when manipulating headers or cookies.
2015-03-11 20:26:49 +01:00
Thierry FOURNIER
594afe76e4 MINOR: lua: wrapper for converters
This patch implements a wrapper to give access to the converters
in the Lua code. The converters are used with the transaction.
The automatically created function are prefixed by "conv_".
2015-03-11 19:55:10 +01:00
Thierry FOURNIER
8fd1376014 MINOR: converters: add function to browse converters
This patch adds a fucntion to browse each converter. This
is used with Lua for using the converters with a wrapper.
2015-03-11 19:55:10 +01:00
Thierry FOURNIER
bb53c7b687 MEDIUM: lua: create a namespace for the fetches
HAProxy proposes many sample fetches. It is possible that the
automatic registration of the sample fetches causes a collision
with an existing Lua function. This patch sets a namespace for
the sample fetches.
2015-03-11 19:55:10 +01:00
Thierry FOURNIER
d2b597aa10 BUG/MEDIUM: lua: segfault with buffer_replace2
The function buffer_contig_space() returns the contiguous space avalaible
to add data (at the end of the input side) while the function
hlua_channel_send_yield() needs to insert data starting at p. Here we
introduce a new function bi_space_for_replace() which returns the amount
of space that can be inserted at the head of the input side with one of
the buffer_replace* functions.

This patch proposes a function that returns the space avalaible after buf->p.
2015-03-09 18:12:59 +01:00