haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-20 14:11:19 +02:00

Author	SHA1	Message	Date
Willy Tarreau	520d95e42b	[MAJOR] buffers: split BF_WRITE_ENA into BF_AUTO_CONNECT and BF_AUTO_CLOSE The BF_WRITE_ENA buffer flag became very complex to deal with, because it was used to : - enable automatic connection - enable close forwarding - enable data forwarding The last point was not very true anymore since we introduced ->send_max, but still the test remained everywhere. This was causing issues such as impossibility to connect without forwarding data, impossibility to prevent closing when data was forwarded, etc... This patch clarifies the situation by getting rid of this multi-purpose flag and replacing it with : - data forwarding based only on ->send_max \|\| ->pipe ; - a new BF_AUTO_CONNECT flag to allow automatic connection and only that ; - ability to perform an automatic connection when ->send_max or ->pipe indicate that data is waiting to leave the buffer ; - a new BF_AUTO_CLOSE flag to let the producer automatically set the BF_SHUTW_NOW flag when it gets a BF_SHUTR. During this cleanup, it was discovered that some tests were performed twice, or that the BF_HIJACK flag was still tested, which is not needed anymore since ->send_max replcaed it. These places have been fixed too. These cleanups have also revealed a few areas where the other flags such as BF_EMPTY are not cleanly used. This will be an opportunity for a second patch.	2009-09-19 21:14:54 +02:00
Willy Tarreau	c77e761968	[MINOR] buffers: inline buffer_si_putchar() By inlining this function and slightly reordering it, we can double the getchar/putchar test throughput, and reduce its footprint by about 40 bytes. Also, it was the only non-inlined char-based function, which now makes it more consistent this time.	2009-09-19 16:34:18 +02:00
Willy Tarreau	9cb8daa203	[MINOR] buffers: add buffer_cut_tail() to cut only unsent data This function is used to cut the "tail" of a buffer, which means strip it to the length of unsent data only, and kill any remaining unsent data. Any scheduled forwarding is stopped. This is mainly to be used to send error messages after existing data. It does the same as buffer_erase() for buffers without pending outgoing data.	2009-09-19 14:53:47 +02:00
Willy Tarreau	91aa577b1f	[BUG] buffer_forward() would not correctly consider data already scheduled The computations in buffer_forward() were only valid if buffer_forward() was used on a buffer which had no more data scheduled for forwarding. This is always the case right now so this bug is not yet triggered but it will soon be. Now we correctly discount the bytes to be forwarded from the data already present in the buffer.	2009-09-19 14:53:47 +02:00
Willy Tarreau	36a5c5389d	[MINOR] buffers: provide buffer_si_putchar() to send a char from a stream interface This function works like a traditional putchar() except that it can return 0 if the output buffer is full. Now a basic character-based echo function would look like this, from a stream interface : while (1) { c = buffer_si_peekchar(req); if (c < 0) break; if (!buffer_si_putchar(res, c)) { si->flags \|= SI_FL_WAIT_ROOM; break; } buffer_skip(req, 1); req->flags \|= BF_WRITE_PARTIAL; res->flags \|= BF_READ_PARTIAL; }	2009-09-19 14:53:47 +02:00
Willy Tarreau	4fe7a2ec6c	[MINOR] buffers: add peekchar and peekline functions for stream interfaces The buffer_si_peekline() function is sort of a fgets() to be used from a stream interface. It returns a complete line whenever possible, and does not update the buffer's pointer, so that the reader is free to consume what it wants to. buffer_si_peekchar() only returns one character, and also needs a call to buffer_skip() once the character is definitely consumed.	2009-09-19 14:53:47 +02:00
Willy Tarreau	aeac31979e	[MEDIUM] buffers: provide new buffer_feed() function This functions act like their buffer_write() counter-parts, except that they're specifically designed to be used from a stream interface handler, as they carefully check size limits and automatically advance the read pointer depending on the to_forward attribute. buffer_feed_chunk() is an inline calling buffer_feed() as both are the sames. For this reason, buffer_write_chunk() has also been turned into an inline which calls buffer_write().	2009-09-19 14:53:46 +02:00
Willy Tarreau	2b7addc833	[MINOR] buffers: provide more functions to handle buffer data buffer_contig_space(), buffer_contig_data() and buffer_skip() provide easy methods to extract/insert data from/into a buffer. buffer_write() and buffer_write_chunk() currently do not check max_len nor to_forward, so they will quickly become embarrassing to use or will need an equivalent. The reason is that they are used to build error messages which currently are not subject to analysis.	2009-09-19 14:53:46 +02:00
Willy Tarreau	a07a34eb24	[MEDIUM] replace BUFSIZE with buf->size in computations The first step towards dynamic buffer size consists in removing all static definitions of the buffer size. Instead, we store a buffer's size in itself. Right now they're all preinitialized to BUFSIZE, but we will change that.	2009-08-16 23:27:46 +02:00
Willy Tarreau	5ca791da8d	[CLEANUP] move remaining stats sockets code to dumpstats The remains of the stats socket code has nothing to do in proto_uxst anymore and must move to dumpstats. The code is much cleaner and more structured. It was also an opportunity to rename AN_REQ_UNIX_STATS as AN_REQ_STATS_SOCK as the stats socket is no longer unix-specific either. The last item refering to stats in proto_uxst is the setting of the task's nice value which should in fact come from the listener.	2009-08-16 19:35:36 +02:00
Willy Tarreau	8e13d7492d	[CLEANUP] unix: remove uxst_process_session() This one is not used anymore.	2009-08-16 19:34:23 +02:00
Willy Tarreau	104eb36f26	[MEDIUM] make the unix stats sockets use the generic session handler process_session() is now ready to handle unix stats sockets. This first step works and old code has not been removed. A cleanup is required. The stats handler is not unix socket-centric anymore and should move to dumpstats.c.	2009-08-16 19:33:51 +02:00
Willy Tarreau	9650f37628	[MEDIUM] move connection establishment from backend to the SI. The connection establishment was completely handled by backend.c which normally just handles LB algos. Since it's purely TCP, it must move to proto_tcp.c. Also, instead of calling it directly, we now call it via the stream interface, which will later help us unify session handling.	2009-08-16 17:46:15 +02:00
Emeric Brun	647caf1ebc	[MEDIUM] add support for RDP cookie persistence The new statement "persist rdp-cookie" enables RDP cookie persistence. The RDP cookie is then extracted from the RDP protocol, and compared against available servers. If a server matches the RDP cookie, then it gets the connection.	2009-07-14 12:50:40 +02:00
Emeric Brun	bede3d0ef4	[MINOR] acl: add support for matching of RDP cookies The RDP protocol is quite simple and documented, which permits an easy detection and extraction of cookies. It can be useful to match the MSTS cookie which can contain the username specified by the client.	2009-07-14 12:50:39 +02:00
Willy Tarreau	bedb9bad67	[MINOR] prepare callers of session_set_backend to handle errors session_set_backend will soon have to allocate areas for HTTP headers. We must ensure that the callers can handle an allocation error.	2009-07-12 08:36:24 +02:00
Willy Tarreau	1d0dfb155d	[MAJOR] http: complete splitting of the remaining stages The HTTP processing has been splitted into 7 steps, one of which is not anymore HTTP-specific (content-switching). That way, it becomes possible to use "use_backend" rules in TCP mode. A new "use_server" directive should follow soon.	2009-07-07 15:10:31 +02:00
Willy Tarreau	3a816293e9	[MEDIUM] session: tell analysers what bit they were called for Some stream analysers might become generic enough to be called for several bits. So we cannot have the analyser bit hard coded into the analyser itself. Let's make the caller inform the callee.	2009-07-07 10:55:49 +02:00
Willy Tarreau	d787e6648c	[MEDIUM] http: split request waiter from request processor We want to split several steps in HTTP processing so that we can call individual analysers depending on what processing we want to perform. The first step consists in splitting the part that waits for a request from the rest.	2009-07-07 10:14:51 +02:00
Willy Tarreau	915e1ebe63	[MEDIUM] config: split parser and checker in two functions This is a first step towards support of multiple configuration files. Now readcfgfile() only reads a file in memory and performs very minimal parsing. The checks are performed afterwards.	2009-06-23 08:17:17 +02:00
Willy Tarreau	c6f4ce8fc4	[MEDIUM] add support for binding to source port ranges during connect Some users are already hitting the 64k source port limit when connecting to servers. The system usually maintains a list of unused source ports, regardless of the source IP they're bound to. So in order to go beyond the 64k concurrent connections, we have to manage the source ip:port lists ourselves. The solution consists in assigning a source port range to each server and use a free port in that range when connecting to that server, either for a proxied connection or for a health check. The port must then be put back into the server's range when the connection is closed. This mechanism is used only when a port range is specified on a server. It makes it possible to reach 64k connections per server, possibly all from the same IP address. Right now it should be more than enough even for huge deployments.	2009-06-10 12:23:32 +02:00
Willy Tarreau	13a34bd110	[MINOR] compute the max of sessions/s on fe/be/srv Some users want to keep the max sessions/s seen on servers, frontends and backends for capacity planning. It's easy to grab it while the session count is updated, so let's keep it.	2009-05-10 18:52:49 +02:00
Willy Tarreau	8f38bd0497	[MINOR] add basic signal handling functions These functions will be used to deliver asynchronous signals in order to make the signal handling functions more robust. The goal is to keep the same interface to signal handlers.	2009-05-10 09:24:23 +02:00
Willy Tarreau	40d2516371	[BUILD] add format(printf) to printf-like functions Doing this helps catching warnings about wrong output formats.	2009-04-03 12:01:47 +02:00
Willy Tarreau	4076a15255	[MEDIUM] http: capture invalid requests/responses even if accepted It's useful to be able to accept an invalid header name in a request or response but still be able to monitor further such errors. Now, when an invalid request/response is received and accepted due to an "accept-invalid-http-{request\|response}" option, the invalid request will be captured for later analysis with "show errors" on the stats socket.	2009-04-02 21:36:37 +02:00
Willy Tarreau	3884cbaae6	[MINOR] show sess: report number of calls to each task For debugging purposes, it can be useful to know how many times each task has been called.	2009-03-28 17:54:35 +01:00
Willy Tarreau	c7bdf09f9f	[MINOR] stats: report number of tasks (active and running) It may be useful for statistics purposes to report the number of tasks.	2009-03-21 18:33:52 +01:00
Willy Tarreau	a461318f97	[MINOR] task: keep a task count and clean up task creators It's sometimes useful at least for statistics to keep a task count. It's easy to do by forcing the rare task creators to always use the same functions to create/destroy a task.	2009-03-21 18:13:21 +01:00
Willy Tarreau	26ca34e66e	[BUG] scheduler: fix improper handling of duplicates __task_queue() The top of a duplicate tree is not where bit == -1 but at the most negative bit. This was causing tasks to be queued in reverse order within duplicates. While this is not dramatic, it's incorrect and might lead to longer than expected duplicate depths under some circumstances.	2009-03-21 12:57:06 +01:00
Willy Tarreau	e35c94a748	[MEDIUM] scheduler: get rid of the 4 trees thanks and use ebtree v4.1 Since we're now able to search from a precise expiration date in the timer tree using ebtree 4.1, we don't need to maintain 4 trees anymore. Not only does this simplify the code a lot, but it also ensures that we can always look 24 days back and ahead, which doubles the ability of the previous scheduler. Indeed, while based on absolute values, the timer tree is now relative to <now> as we can always search from <now>-31 bits. The run queue uses the exact same principle now, and is now simpler and a bit faster to process. With these changes alone, an overall 0.5% performance gain was observed. Tests were performed on the few wrapping cases and everything works as expected.	2009-03-21 10:25:14 +01:00
Willy Tarreau	844553303d	[BUG] session: errors were not reported in termination flags in TCP mode In order to get termination flags properly updated, the session was relying a bit too much on http_return_srv_error() which is http-centric. A generic srv_error function was implemented in the session in order to catch all connection abort situations. It was then noticed that a request abort during a connection attempt was not reported, which is now fixed. Read and write errors/timeouts were not logged either. It was necessary to add those tests at 4 new locations. Now it looks like everything is correctly logged. Most likely some error checking code could now be removed from some analysers.	2009-03-15 22:34:05 +01:00
Willy Tarreau	ff01a21ebe	[MINOR] cfgparse: some cleanups in the consistency checks Check for servers in health mode, for health mode in pure-backends. Some code have been refactored for better organization.	2009-03-15 13:46:16 +01:00
Willy Tarreau	e8a28bf165	[MINOR] buffers: implement buffer_flush() This function will flush the buffer's data, which means that all data remaining in the buffer will be scheduled for sending.	2009-03-08 21:12:04 +01:00
Willy Tarreau	6f0aa476bd	[CLEANUP] buffer_flush() was misleading, rename it as buffer_erase	2009-03-08 20:33:29 +01:00
Willy Tarreau	531cf0cf8d	[OPTIM] task: reduce the number of calls to task_queue() Most of the time, task_queue() will immediately return. By extracting the preliminary checks and putting them in an inline function, we can significantly reduce the number of calls to the function itself, and most of the tests can be optimized away due to the caller's context. Another minor improvement in process_runnable_tasks() consisted in taking benefit from the processor's branch prediction unit by making a special case of the process_session() callback which is by far the most common one. All this improved performance by about 1%, mainly during the call from process_runnable_tasks().	2009-03-08 16:35:27 +01:00
Willy Tarreau	d0a201b35c	[CLEANUP] task: distinguish between clock ticks and timers Timers are unsigned and used as tree positions. Ticks are signed and used as absolute date within current time frame. While the two are normally equal (except zero), it's important not to confuse them in the code as they are not interchangeable. We add two inline functions to turn each one into the other. The comments have also been moved to the proper location, as it was not easy to understand what was a tick and what was a timer unit.	2009-03-08 15:58:07 +01:00
Willy Tarreau	26c250683f	[MEDIUM] minor update to the task api: let the scheduler queue itself All the tasks callbacks had to requeue the task themselves, and update a global timeout. This was not convenient at all. Now the API has been simplified. The tasks callbacks only have to update their expire timer, and return either a pointer to the task or NULL if the task has been deleted. The scheduler will take care of requeuing the task at the proper place in the wait queue.	2009-03-08 09:38:41 +01:00
Willy Tarreau	4726f53794	[OPTIM] task: don't unlink a task from a wait queue when waking it up In many situations, we wake a task on an I/O event, then queue it exactly where it was. This is a real waste because we delete/insert tasks into the wait queue for nothing. The only reason for this is that there was only one tree node in the task struct. By adding another tree node, we can have one tree for the timers (wait queue) and one tree for the priority (run queue). That way, we can have a task both in the run queue and wait queue at the same time. The wait queue now really holds timers, which is what it was designed for. The net gain is at least 1 delete/insert cycle per session, and up to 2-3 depending on the workload, since we save one cycle each time the expiration date is not changed during a wake up.	2009-03-08 07:59:18 +01:00
Willy Tarreau	79584225e5	[OPTIM] rate-limit: cleaner behaviour on low rates and reduce consumption The rate-limit was applied to the smoothed value which does a special case for frequencies below 2 events per period. This caused irregular limitations when set to 1 session per second. The proper way to handle this is to compute the number of remaining events that can occur without reaching the limit. This is what has been added. It also has the benefit that the frequency calculation is now done once when entering event_accept(), before the accept() loop, and not once per accept() loop anymore, thus saving a few CPU cycles during very high loads. With this fix, rate limits of 1/s are perfectly respected.	2009-03-06 09:18:27 +01:00
Willy Tarreau	7f062c4193	[MEDIUM] measure and report session rate on frontend, backends and servers With this change, all frontends, backends, and servers maintain a session counter and a timer to compute a session rate over the last second. This value will be very useful because it varies instantly and can be used to check thresholds. This value is also reported in the stats in a new "rate" column.	2009-03-05 18:43:00 +01:00
Willy Tarreau	74808cb907	[MEDIUM] implement error dump on unix socket with "show errors" The new "show errors" command sent on a unix socket will dump all captured request and response errors for all proxies. It is also possible to bound the log to frontends and backends whose ID is passed as an optional parameter. The output provides information about frontend, backend, server, session ID, source address, error type, and error position along with a complete dump of the request or response which has caused the error. If a new error scratches the one currently being reported, then the dump is aborted with a warning message, and processing goes on to next error.	2009-03-04 15:53:18 +01:00
Willy Tarreau	3eba98aa57	[MEDIUM] splice: make use of pipe pools Using pipe pools makes pipe management a lot easier. It also allows to remove quite a bunch of #ifdefs in areas which depended on the presence or not of support for kernel splicing. The buffer now holds a pointer to a pipe structure which is always NULL except if there are still data in the pipe. When it needs to use that pipe, it dynamically allocates it from the pipe pool. When the data is consumed, the pipe is immediately released. That way, there is no need anymore to care about pipe closure upon session termination, nor about pipe creation when trying to use splice(). Another immediate advantage of this method is that it considerably reduces the number of pipes needed to use splice(). Tests have shown that even with 0.2 pipe per connection, almost all sessions can use splice(), because the same pipe may be used by several consecutive calls to splice().	2009-01-25 13:56:13 +01:00
Willy Tarreau	982b6e37e4	[MEDIUM] introduce pipe pools A new data type has been added : pipes. Some pre-allocated empty pipes are maintained in a pool for users such as splice which use them a lot for very short times. Pipes are allocated using get_pipe() and released using put_pipe(). Pipes which are released with pending data are immediately killed. The struct pipe is small (16 to 20 bytes) and may even be further reduced by unifying ->data and ->next. It would be nice to have a dedicated cleanup task which would watch for the pipes usage and destroy a few of them from time to time.	2009-01-25 13:49:53 +01:00
Willy Tarreau	259de1b702	[MINOR] introduce structures required to support Linux kernel splicing When CONFIG_HAP_LINUX_SPLICE is defined, the buffer structure will be slightly enlarged to support information needed for kernel splicing on Linux. A first attempt consisted in putting this information into the stream interface, but in the long term, it appeared really awkward. This version puts the information into the buffer. The platform-dependant part is conditionally added and will only enlarge the buffers when compiled in. One new flag has also been added to the buffers: BF_KERN_SPLICING. It indicates that the application considers it is appropriate to use splicing to forward remaining data.	2009-01-18 21:56:21 +01:00
Willy Tarreau	03d60bbaf9	[OPTIM] buffer: replace rlim by max_len In the buffers, the read limit used to leave some place for header rewriting was set by a pointer to the end of the buffer. Not only this required subtracts at every place in the code, but this will also soon not be usable anymore when we want to support keepalive. Let's replace this with a length limit, comparable to the buffer's length. This has also sightly reduced the code size.	2009-01-09 11:14:39 +01:00
Willy Tarreau	0abebcc0fb	[MEDIUM] i/o: rework ->to_forward and ->send_max The way the buffers and stream interfaces handled ->to_forward was really not handy for multiple reasons. Now we've moved its control to the receive-side of the buffer, which is also responsible for keeping send_max up to date. This makes more sense as it now becomes possible to send some pre-formatted data followed by forwarded data. The following explanation has also been added to buffer.h to clarify the situation. Right now, tests show that the I/O is behaving extremely well. Some work will have to be done to adapt existing splice code though. /* Note about the buffer structure The buffer contains two length indicators, one to_forward counter and one send_max limit. First, it must be understood that the buffer is in fact split in two parts : - the visible data (->data, for ->l bytes) - the invisible data, typically in kernel buffers forwarded directly from the source stream sock to the destination stream sock (->splice_len bytes). Those are used only during forward. In order not to mix data streams, the producer may only feed the invisible data with data to forward, and only when the visible buffer is empty. The consumer may not always be able to feed the invisible buffer due to platform limitations (lack of kernel support). Conversely, the consumer must always take data from the invisible data first before ever considering visible data. There is no limit to the size of data to consume from the invisible buffer, as platform-specific implementations will rarely leave enough control on this. So any byte fed into the invisible buffer is expected to reach the destination file descriptor, by any means. However, it's the consumer's responsibility to ensure that the invisible data has been entirely consumed before consuming visible data. This must be reflected by ->splice_len. This is very important as this and only this can ensure strict ordering of data between buffers. The producer is responsible for decreasing ->to_forward and increasing ->send_max. The ->to_forward parameter indicates how many bytes may be fed into either data buffer without waking the parent up. The ->send_max parameter says how many bytes may be read from the visible buffer. Thus it may never exceed ->l. This parameter is updated by any buffer_write() as well as any data forwarded through the visible buffer. The consumer is responsible for decreasing ->send_max when it sends data from the visible buffer, and ->splice_len when it sends data from the invisible buffer. A real-world example consists in part in an HTTP response waiting in a buffer to be forwarded. We know the header length (300) and the amount of data to forward (content-length=9000). The buffer already contains 1000 bytes of data after the 300 bytes of headers. Thus the caller will set ->send_max to 300 indicating that it explicitly wants to send those data, and set ->to_forward to 9000 (content-length). This value must be normalised immediately after updating ->to_forward : since there are already 1300 bytes in the buffer, 300 of which are already counted in ->send_max, and that size is smaller than ->to_forward, we must update ->send_max to 1300 to flush the whole buffer, and reduce ->to_forward to 8000. After that, the producer may try to feed the additional data through the invisible buffer using a platform-specific method such as splice(). */	2009-01-09 10:15:03 +01:00
Willy Tarreau	dcef33fa9b	[MINOR] add the splice_len member to the buffer struct in preparation of splice support In preparation of splice support, let's add the splice_len member to the buffer struct. An earlier implementation made it conditional, which made the whole logics very complex due to a large number of ifdefs. Now BF_EMPTY is only set once both buf->l and buf->splice_len are null. Splice_len is initialized to zero during buffer creation and is currently not changed, so the whole logics remains unaffected. When splice gets merged, splice_len will reflect the number of bytes in flight out of the buffer but not yet sent, typically in a pipe for the Linux case.	2009-01-09 10:15:02 +01:00
Willy Tarreau	6b66f3e4f6	[MAJOR] implement autonomous inter-socket forwarding If an analyser sets buf->to_forward to a given value, that many data will be forwarded between the two stream interfaces attached to a buffer without waking the task up. The same applies once all analysers have been released. This saves a large amount of calls to process_session() and a number of task_dequeue/queue.	2009-01-09 10:15:02 +01:00
Willy Tarreau	3ffeba1f67	[MEDIUM] enable inter-stream_interface wakeup calls By letting the producer tell the consumer there is data to check, and the consumer tell the producer there is some space left again, we can cut in half the number of session wakeups. This is also an important starting point for future splicing support.	2008-12-28 11:09:02 +01:00
Willy Tarreau	86491c3164	[MEDIUM] indicate when we don't care about read timeout Sometimes we don't care about a read timeout, for instance, from the client when waiting for the server, but we still want the client to be able to read. Till now it was done by articially forcing the read timeout to ETERNITY. But this will cause trouble when we want the low level stream sock to communicate without waking the session up. So we add a BF_READ_NOEXP flag to indicate that when the read timeout is to be set, it might have to be set to ETERNITY. Since BF_READ_ENA was not used, we replaced this flag.	2008-12-28 11:06:40 +01:00

... 6 7 8 9 10 ...

544 Commits