544 Commits

Author SHA1 Message Date
Willy Tarreau
520d95e42b [MAJOR] buffers: split BF_WRITE_ENA into BF_AUTO_CONNECT and BF_AUTO_CLOSE
The BF_WRITE_ENA buffer flag became very complex to deal with, because
it was used to :
  - enable automatic connection
  - enable close forwarding
  - enable data forwarding

The last point was not very true anymore since we introduced ->send_max,
but still the test remained everywhere. This was causing issues such as
impossibility to connect without forwarding data, impossibility to prevent
closing when data was forwarded, etc...

This patch clarifies the situation by getting rid of this multi-purpose
flag and replacing it with :
  - data forwarding based only on ->send_max || ->pipe ;
  - a new BF_AUTO_CONNECT flag to allow automatic connection and only
    that ;
  - ability to perform an automatic connection when ->send_max or ->pipe
    indicate that data is waiting to leave the buffer ;
  - a new BF_AUTO_CLOSE flag to let the producer automatically set the
    BF_SHUTW_NOW flag when it gets a BF_SHUTR.

During this cleanup, it was discovered that some tests were performed
twice, or that the BF_HIJACK flag was still tested, which is not needed
anymore since ->send_max replcaed it. These places have been fixed too.

These cleanups have also revealed a few areas where the other flags
such as BF_EMPTY are not cleanly used. This will be an opportunity for
a second patch.
2009-09-19 21:14:54 +02:00
Willy Tarreau
c77e761968 [MINOR] buffers: inline buffer_si_putchar()
By inlining this function and slightly reordering it, we can double
the getchar/putchar test throughput, and reduce its footprint by about
40 bytes. Also, it was the only non-inlined char-based function, which
now makes it more consistent this time.
2009-09-19 16:34:18 +02:00
Willy Tarreau
9cb8daa203 [MINOR] buffers: add buffer_cut_tail() to cut only unsent data
This function is used to cut the "tail" of a buffer, which means strip it
to the length of unsent data only, and kill any remaining unsent data. Any
scheduled forwarding is stopped. This is mainly to be used to send error
messages after existing data. It does the same as buffer_erase() for buffers
without pending outgoing data.
2009-09-19 14:53:47 +02:00
Willy Tarreau
91aa577b1f [BUG] buffer_forward() would not correctly consider data already scheduled
The computations in buffer_forward() were only valid if buffer_forward()
was used on a buffer which had no more data scheduled for forwarding.
This is always the case right now so this bug is not yet triggered but
it will soon be. Now we correctly discount the bytes to be forwarded
from the data already present in the buffer.
2009-09-19 14:53:47 +02:00
Willy Tarreau
36a5c5389d [MINOR] buffers: provide buffer_si_putchar() to send a char from a stream interface
This function works like a traditional putchar() except that it
can return 0 if the output buffer is full.

Now a basic character-based echo function would look like this, from
a stream interface :

	while (1) {
		c = buffer_si_peekchar(req);
		if (c < 0)
			break;
		if (!buffer_si_putchar(res, c)) {
			si->flags |= SI_FL_WAIT_ROOM;
			break;
		}
		buffer_skip(req, 1);
		req->flags |= BF_WRITE_PARTIAL;
		res->flags |= BF_READ_PARTIAL;
	}
2009-09-19 14:53:47 +02:00
Willy Tarreau
4fe7a2ec6c [MINOR] buffers: add peekchar and peekline functions for stream interfaces
The buffer_si_peekline() function is sort of a fgets() to be used from a
stream interface. It returns a complete line whenever possible, and does
not update the buffer's pointer, so that the reader is free to consume
what it wants to.

buffer_si_peekchar() only returns one character, and also needs a call
to buffer_skip() once the character is definitely consumed.
2009-09-19 14:53:47 +02:00
Willy Tarreau
aeac31979e [MEDIUM] buffers: provide new buffer_feed*() function
This functions act like their buffer_write*() counter-parts,
except that they're specifically designed to be used from a
stream interface handler, as they carefully check size limits
and automatically advance the read pointer depending on the
to_forward attribute.

buffer_feed_chunk() is an inline calling buffer_feed() as both
are the sames. For this reason, buffer_write_chunk() has also
been turned into an inline which calls buffer_write().
2009-09-19 14:53:46 +02:00
Willy Tarreau
2b7addc833 [MINOR] buffers: provide more functions to handle buffer data
buffer_contig_space(), buffer_contig_data() and buffer_skip()
provide easy methods to extract/insert data from/into a buffer.

buffer_write() and buffer_write_chunk() currently do not check
max_len nor to_forward, so they will quickly become embarrassing
to use or will need an equivalent. The reason is that they are
used to build error messages which currently are not subject to
analysis.
2009-09-19 14:53:46 +02:00
Willy Tarreau
a07a34eb24 [MEDIUM] replace BUFSIZE with buf->size in computations
The first step towards dynamic buffer size consists in removing
all static definitions of the buffer size. Instead, we store a
buffer's size in itself. Right now they're all preinitialized
to BUFSIZE, but we will change that.
2009-08-16 23:27:46 +02:00
Willy Tarreau
5ca791da8d [CLEANUP] move remaining stats sockets code to dumpstats
The remains of the stats socket code has nothing to do in proto_uxst
anymore and must move to dumpstats. The code is much cleaner and more
structured. It was also an opportunity to rename AN_REQ_UNIX_STATS
as AN_REQ_STATS_SOCK as the stats socket is no longer unix-specific
either.

The last item refering to stats in proto_uxst is the setting of the
task's nice value which should in fact come from the listener.
2009-08-16 19:35:36 +02:00
Willy Tarreau
8e13d7492d [CLEANUP] unix: remove uxst_process_session()
This one is not used anymore.
2009-08-16 19:34:23 +02:00
Willy Tarreau
104eb36f26 [MEDIUM] make the unix stats sockets use the generic session handler
process_session() is now ready to handle unix stats sockets. This
first step works and old code has not been removed. A cleanup is
required. The stats handler is not unix socket-centric anymore and
should move to dumpstats.c.
2009-08-16 19:33:51 +02:00
Willy Tarreau
9650f37628 [MEDIUM] move connection establishment from backend to the SI.
The connection establishment was completely handled by backend.c which
normally just handles LB algos. Since it's purely TCP, it must move to
proto_tcp.c. Also, instead of calling it directly, we now call it via
the stream interface, which will later help us unify session handling.
2009-08-16 17:46:15 +02:00
Emeric Brun
647caf1ebc [MEDIUM] add support for RDP cookie persistence
The new statement "persist rdp-cookie" enables RDP cookie
persistence. The RDP cookie is then extracted from the RDP
protocol, and compared against available servers. If a server
matches the RDP cookie, then it gets the connection.
2009-07-14 12:50:40 +02:00
Emeric Brun
bede3d0ef4 [MINOR] acl: add support for matching of RDP cookies
The RDP protocol is quite simple and documented, which permits
an easy detection and extraction of cookies. It can be useful
to match the MSTS cookie which can contain the username specified
by the client.
2009-07-14 12:50:39 +02:00
Willy Tarreau
bedb9bad67 [MINOR] prepare callers of session_set_backend to handle errors
session_set_backend will soon have to allocate areas for HTTP
headers. We must ensure that the callers can handle an allocation
error.
2009-07-12 08:36:24 +02:00
Willy Tarreau
1d0dfb155d [MAJOR] http: complete splitting of the remaining stages
The HTTP processing has been splitted into 7 steps, one of which
is not anymore HTTP-specific (content-switching). That way, it
becomes possible to use "use_backend" rules in TCP mode. A new
"use_server" directive should follow soon.
2009-07-07 15:10:31 +02:00
Willy Tarreau
3a816293e9 [MEDIUM] session: tell analysers what bit they were called for
Some stream analysers might become generic enough to be called
for several bits. So we cannot have the analyser bit hard coded
into the analyser itself. Let's make the caller inform the callee.
2009-07-07 10:55:49 +02:00
Willy Tarreau
d787e6648c [MEDIUM] http: split request waiter from request processor
We want to split several steps in HTTP processing so that
we can call individual analysers depending on what processing
we want to perform. The first step consists in splitting the
part that waits for a request from the rest.
2009-07-07 10:14:51 +02:00
Willy Tarreau
915e1ebe63 [MEDIUM] config: split parser and checker in two functions
This is a first step towards support of multiple configuration files.
Now readcfgfile() only reads a file in memory and performs very minimal
parsing. The checks are performed afterwards.
2009-06-23 08:17:17 +02:00
Willy Tarreau
c6f4ce8fc4 [MEDIUM] add support for binding to source port ranges during connect
Some users are already hitting the 64k source port limit when
connecting to servers. The system usually maintains a list of
unused source ports, regardless of the source IP they're bound
to. So in order to go beyond the 64k concurrent connections, we
have to manage the source ip:port lists ourselves.

The solution consists in assigning a source port range to each
server and use a free port in that range when connecting to that
server, either for a proxied connection or for a health check.
The port must then be put back into the server's range when the
connection is closed.

This mechanism is used only when a port range is specified on
a server. It makes it possible to reach 64k connections per
server, possibly all from the same IP address. Right now it
should be more than enough even for huge deployments.
2009-06-10 12:23:32 +02:00
Willy Tarreau
13a34bd110 [MINOR] compute the max of sessions/s on fe/be/srv
Some users want to keep the max sessions/s seen on servers, frontends
and backends for capacity planning. It's easy to grab it while the
session count is updated, so let's keep it.
2009-05-10 18:52:49 +02:00
Willy Tarreau
8f38bd0497 [MINOR] add basic signal handling functions
These functions will be used to deliver asynchronous signals in order
to make the signal handling functions more robust. The goal is to keep
the same interface to signal handlers.
2009-05-10 09:24:23 +02:00
Willy Tarreau
40d2516371 [BUILD] add format(printf) to printf-like functions
Doing this helps catching warnings about wrong output formats.
2009-04-03 12:01:47 +02:00
Willy Tarreau
4076a15255 [MEDIUM] http: capture invalid requests/responses even if accepted
It's useful to be able to accept an invalid header name in a request
or response but still be able to monitor further such errors. Now,
when an invalid request/response is received and accepted due to
an "accept-invalid-http-{request|response}" option, the invalid
request will be captured for later analysis with "show errors" on
the stats socket.
2009-04-02 21:36:37 +02:00
Willy Tarreau
3884cbaae6 [MINOR] show sess: report number of calls to each task
For debugging purposes, it can be useful to know how many times each
task has been called.
2009-03-28 17:54:35 +01:00
Willy Tarreau
c7bdf09f9f [MINOR] stats: report number of tasks (active and running)
It may be useful for statistics purposes to report the number of
tasks.
2009-03-21 18:33:52 +01:00
Willy Tarreau
a461318f97 [MINOR] task: keep a task count and clean up task creators
It's sometimes useful at least for statistics to keep a task count.
It's easy to do by forcing the rare task creators to always use the
same functions to create/destroy a task.
2009-03-21 18:13:21 +01:00
Willy Tarreau
26ca34e66e [BUG] scheduler: fix improper handling of duplicates __task_queue()
The top of a duplicate tree is not where bit == -1 but at the most
negative bit. This was causing tasks to be queued in reverse order
within duplicates. While this is not dramatic, it's incorrect and
might lead to longer than expected duplicate depths under some
circumstances.
2009-03-21 12:57:06 +01:00
Willy Tarreau
e35c94a748 [MEDIUM] scheduler: get rid of the 4 trees thanks and use ebtree v4.1
Since we're now able to search from a precise expiration date in
the timer tree using ebtree 4.1, we don't need to maintain 4 trees
anymore. Not only does this simplify the code a lot, but it also
ensures that we can always look 24 days back and ahead, which
doubles the ability of the previous scheduler. Indeed, while based
on absolute values, the timer tree is now relative to <now> as we
can always search from <now>-31 bits.

The run queue uses the exact same principle now, and is now simpler
and a bit faster to process. With these changes alone, an overall
0.5% performance gain was observed.

Tests were performed on the few wrapping cases and everything works
as expected.
2009-03-21 10:25:14 +01:00
Willy Tarreau
844553303d [BUG] session: errors were not reported in termination flags in TCP mode
In order to get termination flags properly updated, the session was
relying a bit too much on http_return_srv_error() which is http-centric.

A generic srv_error function was implemented in the session in order to
catch all connection abort situations. It was then noticed that a request
abort during a connection attempt was not reported, which is now fixed.

Read and write errors/timeouts were not logged either. It was necessary
to add those tests at 4 new locations.

Now it looks like everything is correctly logged. Most likely some error
checking code could now be removed from some analysers.
2009-03-15 22:34:05 +01:00
Willy Tarreau
ff01a21ebe [MINOR] cfgparse: some cleanups in the consistency checks
Check for servers in health mode, for health mode in pure-backends.
Some code have been refactored for better organization.
2009-03-15 13:46:16 +01:00
Willy Tarreau
e8a28bf165 [MINOR] buffers: implement buffer_flush()
This function will flush the buffer's data, which means that all data
remaining in the buffer will be scheduled for sending.
2009-03-08 21:12:04 +01:00
Willy Tarreau
6f0aa476bd [CLEANUP] buffer_flush() was misleading, rename it as buffer_erase 2009-03-08 20:33:29 +01:00
Willy Tarreau
531cf0cf8d [OPTIM] task: reduce the number of calls to task_queue()
Most of the time, task_queue() will immediately return. By extracting
the preliminary checks and putting them in an inline function, we can
significantly reduce the number of calls to the function itself, and
most of the tests can be optimized away due to the caller's context.

Another minor improvement in process_runnable_tasks() consisted in
taking benefit from the processor's branch prediction unit by making
a special case of the process_session() callback which is by far the
most common one.

All this improved performance by about 1%, mainly during the call
from process_runnable_tasks().
2009-03-08 16:35:27 +01:00
Willy Tarreau
d0a201b35c [CLEANUP] task: distinguish between clock ticks and timers
Timers are unsigned and used as tree positions. Ticks are signed and
used as absolute date within current time frame. While the two are
normally equal (except zero), it's important not to confuse them in
the code as they are not interchangeable.

We add two inline functions to turn each one into the other.

The comments have also been moved to the proper location, as it was
not easy to understand what was a tick and what was a timer unit.
2009-03-08 15:58:07 +01:00
Willy Tarreau
26c250683f [MEDIUM] minor update to the task api: let the scheduler queue itself
All the tasks callbacks had to requeue the task themselves, and update
a global timeout. This was not convenient at all. Now the API has been
simplified. The tasks callbacks only have to update their expire timer,
and return either a pointer to the task or NULL if the task has been
deleted. The scheduler will take care of requeuing the task at the
proper place in the wait queue.
2009-03-08 09:38:41 +01:00
Willy Tarreau
4726f53794 [OPTIM] task: don't unlink a task from a wait queue when waking it up
In many situations, we wake a task on an I/O event, then queue it
exactly where it was. This is a real waste because we delete/insert
tasks into the wait queue for nothing. The only reason for this is
that there was only one tree node in the task struct.

By adding another tree node, we can have one tree for the timers
(wait queue) and one tree for the priority (run queue). That way,
we can have a task both in the run queue and wait queue at the
same time. The wait queue now really holds timers, which is what
it was designed for.

The net gain is at least 1 delete/insert cycle per session, and up
to 2-3 depending on the workload, since we save one cycle each time
the expiration date is not changed during a wake up.
2009-03-08 07:59:18 +01:00
Willy Tarreau
79584225e5 [OPTIM] rate-limit: cleaner behaviour on low rates and reduce consumption
The rate-limit was applied to the smoothed value which does a special
case for frequencies below 2 events per period. This caused irregular
limitations when set to 1 session per second.

The proper way to handle this is to compute the number of remaining
events that can occur without reaching the limit. This is what has
been added. It also has the benefit that the frequency calculation
is now done once when entering event_accept(), before the accept()
loop, and not once per accept() loop anymore, thus saving a few CPU
cycles during very high loads.

With this fix, rate limits of 1/s are perfectly respected.
2009-03-06 09:18:27 +01:00
Willy Tarreau
7f062c4193 [MEDIUM] measure and report session rate on frontend, backends and servers
With this change, all frontends, backends, and servers maintain a session
counter and a timer to compute a session rate over the last second. This
value will be very useful because it varies instantly and can be used to
check thresholds. This value is also reported in the stats in a new "rate"
column.
2009-03-05 18:43:00 +01:00
Willy Tarreau
74808cb907 [MEDIUM] implement error dump on unix socket with "show errors"
The new "show errors" command sent on a unix socket will dump
all captured request and response errors for all proxies. It is
also possible to bound the log to frontends and backends whose
ID is passed as an optional parameter.

The output provides information about frontend, backend, server,
session ID, source address, error type, and error position along
with a complete dump of the request or response which has caused
the error.

If a new error scratches the one currently being reported, then
the dump is aborted with a warning message, and processing goes
on to next error.
2009-03-04 15:53:18 +01:00
Willy Tarreau
3eba98aa57 [MEDIUM] splice: make use of pipe pools
Using pipe pools makes pipe management a lot easier. It also allows to
remove quite a bunch of #ifdefs in areas which depended on the presence
or not of support for kernel splicing.

The buffer now holds a pointer to a pipe structure which is always NULL
except if there are still data in the pipe. When it needs to use that
pipe, it dynamically allocates it from the pipe pool. When the data is
consumed, the pipe is immediately released.

That way, there is no need anymore to care about pipe closure upon
session termination, nor about pipe creation when trying to use
splice().

Another immediate advantage of this method is that it considerably
reduces the number of pipes needed to use splice(). Tests have shown
that even with 0.2 pipe per connection, almost all sessions can use
splice(), because the same pipe may be used by several consecutive
calls to splice().
2009-01-25 13:56:13 +01:00
Willy Tarreau
982b6e37e4 [MEDIUM] introduce pipe pools
A new data type has been added : pipes. Some pre-allocated empty pipes
are maintained in a pool for users such as splice which use them a lot
for very short times.

Pipes are allocated using get_pipe() and released using put_pipe().
Pipes which are released with pending data are immediately killed.
The struct pipe is small (16 to 20 bytes) and may even be further
reduced by unifying ->data and ->next.

It would be nice to have a dedicated cleanup task which would watch
for the pipes usage and destroy a few of them from time to time.
2009-01-25 13:49:53 +01:00
Willy Tarreau
259de1b702 [MINOR] introduce structures required to support Linux kernel splicing
When CONFIG_HAP_LINUX_SPLICE is defined, the buffer structure will be
slightly enlarged to support information needed for kernel splicing
on Linux.

A first attempt consisted in putting this information into the stream
interface, but in the long term, it appeared really awkward. This
version puts the information into the buffer. The platform-dependant
part is conditionally added and will only enlarge the buffers when
compiled in.

One new flag has also been added to the buffers: BF_KERN_SPLICING.
It indicates that the application considers it is appropriate to
use splicing to forward remaining data.
2009-01-18 21:56:21 +01:00
Willy Tarreau
03d60bbaf9 [OPTIM] buffer: replace rlim by max_len
In the buffers, the read limit used to leave some place for header
rewriting was set by a pointer to the end of the buffer. Not only
this required subtracts at every place in the code, but this will
also soon not be usable anymore when we want to support keepalive.

Let's replace this with a length limit, comparable to the buffer's
length. This has also sightly reduced the code size.
2009-01-09 11:14:39 +01:00
Willy Tarreau
0abebcc0fb [MEDIUM] i/o: rework ->to_forward and ->send_max
The way the buffers and stream interfaces handled ->to_forward was
really not handy for multiple reasons. Now we've moved its control
to the receive-side of the buffer, which is also responsible for
keeping send_max up to date. This makes more sense as it now becomes
possible to send some pre-formatted data followed by forwarded data.

The following explanation has also been added to buffer.h to clarify
the situation. Right now, tests show that the I/O is behaving extremely
well. Some work will have to be done to adapt existing splice code
though.

/* Note about the buffer structure

   The buffer contains two length indicators, one to_forward counter and one
   send_max limit. First, it must be understood that the buffer is in fact
   split in two parts :
     - the visible data (->data, for ->l bytes)
     - the invisible data, typically in kernel buffers forwarded directly from
       the source stream sock to the destination stream sock (->splice_len
       bytes). Those are used only during forward.

   In order not to mix data streams, the producer may only feed the invisible
   data with data to forward, and only when the visible buffer is empty. The
   consumer may not always be able to feed the invisible buffer due to platform
   limitations (lack of kernel support).

   Conversely, the consumer must always take data from the invisible data first
   before ever considering visible data. There is no limit to the size of data
   to consume from the invisible buffer, as platform-specific implementations
   will rarely leave enough control on this. So any byte fed into the invisible
   buffer is expected to reach the destination file descriptor, by any means.
   However, it's the consumer's responsibility to ensure that the invisible
   data has been entirely consumed before consuming visible data. This must be
   reflected by ->splice_len. This is very important as this and only this can
   ensure strict ordering of data between buffers.

   The producer is responsible for decreasing ->to_forward and increasing
   ->send_max. The ->to_forward parameter indicates how many bytes may be fed
   into either data buffer without waking the parent up. The ->send_max
   parameter says how many bytes may be read from the visible buffer. Thus it
   may never exceed ->l. This parameter is updated by any buffer_write() as
   well as any data forwarded through the visible buffer.

   The consumer is responsible for decreasing ->send_max when it sends data
   from the visible buffer, and ->splice_len when it sends data from the
   invisible buffer.

   A real-world example consists in part in an HTTP response waiting in a
   buffer to be forwarded. We know the header length (300) and the amount of
   data to forward (content-length=9000). The buffer already contains 1000
   bytes of data after the 300 bytes of headers. Thus the caller will set
   ->send_max to 300 indicating that it explicitly wants to send those data,
   and set ->to_forward to 9000 (content-length). This value must be normalised
   immediately after updating ->to_forward : since there are already 1300 bytes
   in the buffer, 300 of which are already counted in ->send_max, and that size
   is smaller than ->to_forward, we must update ->send_max to 1300 to flush the
   whole buffer, and reduce ->to_forward to 8000. After that, the producer may
   try to feed the additional data through the invisible buffer using a
   platform-specific method such as splice().
 */
2009-01-09 10:15:03 +01:00
Willy Tarreau
dcef33fa9b [MINOR] add the splice_len member to the buffer struct in preparation of splice support
In preparation of splice support, let's add the splice_len member
to the buffer struct. An earlier implementation made it conditional,
which made the whole logics very complex due to a large number of
ifdefs.

Now BF_EMPTY is only set once both buf->l and buf->splice_len are
null. Splice_len is initialized to zero during buffer creation and
is currently not changed, so the whole logics remains unaffected.

When splice gets merged, splice_len will reflect the number of bytes
in flight out of the buffer but not yet sent, typically in a pipe for
the Linux case.
2009-01-09 10:15:02 +01:00
Willy Tarreau
6b66f3e4f6 [MAJOR] implement autonomous inter-socket forwarding
If an analyser sets buf->to_forward to a given value, that many
data will be forwarded between the two stream interfaces attached
to a buffer without waking the task up. The same applies once all
analysers have been released. This saves a large amount of calls
to process_session() and a number of task_dequeue/queue.
2009-01-09 10:15:02 +01:00
Willy Tarreau
3ffeba1f67 [MEDIUM] enable inter-stream_interface wakeup calls
By letting the producer tell the consumer there is data to check,
and the consumer tell the producer there is some space left again,
we can cut in half the number of session wakeups.

This is also an important starting point for future splicing support.
2008-12-28 11:09:02 +01:00
Willy Tarreau
86491c3164 [MEDIUM] indicate when we don't care about read timeout
Sometimes we don't care about a read timeout, for instance, from the
client when waiting for the server, but we still want the client to
be able to read.

Till now it was done by articially forcing the read timeout to ETERNITY.
But this will cause trouble when we want the low level stream sock to
communicate without waking the session up. So we add a BF_READ_NOEXP
flag to indicate that when the read timeout is to be set, it might
have to be set to ETERNITY.

Since BF_READ_ENA was not used, we replaced this flag.
2008-12-28 11:06:40 +01:00