haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-25 00:21:24 +02:00

Author	SHA1	Message	Date
Willy Tarreau	a24adf0795	MAJOR: session: only wake up as many sessions as available buffers permit We've already experimented with three wake up algorithms when releasing buffers : the first naive one used to wake up far too many sessions, causing many of them not to get any buffer. The second approach which was still in use prior to this patch consisted in waking up either 1 or 2 sessions depending on the number of FDs we had released. And this was still inaccurate. The third one tried to cover the accuracy issues of the second and took into consideration the number of FDs the sessions would be willing to use, but most of the time we ended up waking up too many of them for nothing, or deadlocking by lack of buffers. This patch completely removes the need to allocate two buffers at once. Instead it splits allocations into critical and non-critical ones and implements a reserve in the pool for this. The deadlock situation happens when all buffers are be allocated for requests pending in a maxconn-limited server queue, because then there's no more way to allocate buffers for responses, and these responses are critical to release the servers's connection in order to release the pending requests. In fact maxconn on a server creates a dependence between sessions and particularly between oldest session's responses and latest session's requests. Thus, it is mandatory to get a free buffer for a response in order to release a server connection which will permit to release a request buffer. Since we definitely have non-symmetrical buffers, we need to implement this logic in the buffer allocation mechanism. What this commit does is implement a reserve of buffers which can only be allocated for responses and that will never be allocated for requests. This is made possible by the requester indicating how much margin it wants to leave after the allocation succeeds. Thus it is a cooperative allocation mechanism : the requester (process_session() in general) prefers not to get a buffer in order to respect other's need for response buffers. The session management code always knows if a buffer will be used for requests or responses, so that is not difficult : - either there's an applet on the initiator side and we really need the request buffer (since currently the applet is called in the context of the session) - or we have a connection and we really need the response buffer (in order to support building and sending an error message back) This reserve ensures that we don't take all allocatable buffers for requests waiting in a queue. The downside is that all the extra buffers are really allocated to ensure they can be allocated. But with small values it is not an issue. With this change, we don't observe any more deadlocks even when running with maxconn 1 on a server under severely constrained memory conditions. The code becomes a bit tricky, it relies on the scheduler's run queue to estimate how many sessions are already expected to run so that it doesn't wake up everyone with too few resources. A better solution would probably consist in having two queues, one for urgent requests and one for normal requests. A failed allocation for a session dealing with an error, a connection event, or the need for a response (or request when there's an applet on the left) would go to the urgent request queue, while other requests would go to the other queue. Urgent requests would be served from 1 entry in the pool, while the regular ones would be served only according to the reserve. Despite not yet having this, it works remarkably well. This mechanism is quite efficient, we don't perform too many wake up calls anymore. For 1 million sessions elapsed during massive memory contention, we observe about 4.5M calls to process_session() compared to 4.0M without memory constraints. Previously we used to observe up to 16M calls, which rougly means 12M failures. During a test run under high memory constraints (limit enforced to 27 MB instead of the 58 MB normally needed), performance used to drop by 53% prior to this patch. Now with this patch instead it increases by about 1.5%. The best effect of this change is that by limiting the memory usage to about 2/3 to 3/4 of what is needed by default, it's possible to increase performance by up to about 18% mainly due to the fact that pools are reused more often and remain hot in the CPU cache (observed on regular HTTP traffic with 20k objects, buffers.limit = maxconn/10, buffers.reserve = limit/2). Below is an example of scenario which used to cause a deadlock previously : - connection is received - two buffers are allocated in process_session() then released - one is allocated when receiving an HTTP request - the second buffer is allocated then released in process_session() for request parsing then connection establishment. - poll() says we can send, so the request buffer is sent and released - process session gets notified that the connection is now established and allocates two buffers then releases them - all other sessions do the same till one cannot get the request buffer without hitting the margin - and now the server responds. stream_interface allocates the response buffer and manages to get it since it's higher priority being for a response. - but process_session() cannot allocate the request buffer anymore => We could end up with all buffers used by responses so that none may be allocated for a request in process_session(). When the applet processing leaves the session context, the test will have to be changed so that we always allocate a response buffer regardless of the left side (eg: H2->H1 gateway). A final improvement would consists in being able to only retry the failed I/O operation without waking up a task, but to date all experiments to achieve this have proven not to be reliable enough.	2014-12-24 23:47:33 +01:00
Willy Tarreau	9841019e42	MINOR: stats: report a "waiting" flags for sessions This flag indicates if the session is still waiting for some memory.	2014-12-24 23:47:33 +01:00
Willy Tarreau	10fc09e872	MAJOR: session: only allocate buffers when needed A session doesn't need buffers all the time, especially when they're empty. With this patch, we don't allocate buffers anymore when the session is initialized, we only allocate them in two cases : - during process_session() - during I/O operations During process_session(), we try hard to allocate both buffers at once so that we know for sure that a started operation can complete. Indeed, a previous version of this patch used to allocate one buffer at a time, but it can result in a deadlock when all buffers are allocated for requests for example, and there's no buffer left to emit error responses. Here, if any of the buffers cannot be allocated, the whole operation is cancelled and the session is added at the tail of the buffer wait queue. At the end of process_session(), a call to session_release_buffers() is done so that we can offer unused buffers to other sessions waiting for them. For I/O operations, we only need to allocate a buffer on the Rx path. For this, we only allocate a single buffer but ensure that at least two are available to avoid the deadlock situation. In case buffers are not available, SI_FL_WAIT_ROOM is set on the stream interface and the session is queued. Unused buffers resulting either from a successful send() or from an unused read buffer are offered to pending sessions during the ->wake() callback.	2014-12-24 23:47:33 +01:00
Willy Tarreau	bf883e0aa7	MAJOR: session: implement a wait-queue for sessions who need a buffer When a session_alloc_buffers() fails to allocate one or two buffers, it subscribes the session to buffer_wq, and waits for another session to release buffers. It's then removed from the queue and woken up with TASK_WAKE_RES, and can attempt its allocation again. We decide to try to wake as many waiters as we release buffers so that if we release 2 and two waiters need only once, they both have their chance. We must never come to the situation where we don't wake enough tasks up. It's common to release buffers after the completion of an I/O callback, which can happen even if the I/O could not be performed due to half a failure on memory allocation. In this situation, we don't want to move out of the wait queue the session that was just added, otherwise it will never get any buffer. Thus, we only force ourselves out of the queue when freeing the session. Note: at the moment, since session_alloc_buffers() is not used, no task is subscribed to the wait queue.	2014-12-24 23:47:33 +01:00
Willy Tarreau	656859d478	MEDIUM: session: implement a basic atomic buffer allocator This patch introduces session_alloc_recv_buffer(), session_alloc_buffers() and session_release_buffers() whose purpose will be to allocate missing buffers and release unneeded ones around the process_session() and during I/O operations. I/O callbacks only need a single buffer for recv operations and none for send. However we still want to ensure that we don't pick the last buffer. That's what session_alloc_recv_buffer() is for. This allocator is atomic in that it always ensures we can get 2 buffers or fails. Here, if any of the buffers is not ready and cannot be allocated, the operation is cancelled. The purpose is to guarantee that we don't enter into the deadlock where all buffers are allocated by the same size of all sessions. A queue will have to be implemented for failed allocations. For now they're just reported as failures.	2014-12-24 23:47:32 +01:00
Willy Tarreau	909e267be0	MINOR: session: group buffer allocations together We'll soon want to release buffers together upon failure so we need to allocate them after the channels. Let's change this now. There's no impact on the behaviour, only the error path is unrolled slightly differently. The same was done in peers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	f2f7d6b27b	MEDIUM: buffer: add a new buf_wanted dummy buffer to report failed allocations Doing so ensures that even when no memory is available, we leave the channel in a sane condition. There's a special case in proto_http.c regarding the compression, we simply pre-allocate the tmpbuf to point to the dummy buffer. Not reusing &buf_empty for this allows the rest of the code to differenciate an empty buffer that's not used from an empty buffer that results from a failed allocation which has the same semantics as a buffer full.	2014-12-24 23:47:32 +01:00
Willy Tarreau	2a4b54359b	MEDIUM: buffer: always assign a dummy empty buffer to channels Channels are now created with a valid pointer to a buffer before the buffer is allocated. This buffer is a global one called "buf_empty" and of size zero. Thus it prevents any activity from being performed on the buffer and still ensures that chn->buf may always be dereferenced. b_free() also resets the buffer to &buf_empty, and was split into b_drop() which does not reset the buffer.	2014-12-24 23:47:32 +01:00
Willy Tarreau	7dfca9daec	MINOR: buffer: only use b_free to release buffers We don't call pool_free2(pool2_buffers) anymore, we only call b_free() to do the job. This ensures that we can start to centralize the releasing of buffers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	696a2910a0	MINOR: buffer: move buffer initialization after channel initialization It's not clean to initialize the buffer before the channel since it dereferences one pointer in the channel. Also we'll want to let the channel pre-initialize the buffer, so let's ensure that the channel is always initialized prior to the buffers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	e583ea583a	MEDIUM: buffer: use b_alloc() to allocate and initialize a buffer b_alloc() now allocates a buffer and initializes it to the size specified in the pool minus the size of the struct buffer itself. This ensures that callers do not need to care about buffer details anymore. Also this never applies memory poisonning, which is slow and useless on buffers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	474cf54a97	MINOR: buffer: reset a buffer in b_reset() and not channel_init() We'll soon need to be able to switch buffers without touching the channel, so let's move buffer initialization out of channel_init(). We had the same in compressoin.c.	2014-12-24 23:47:31 +01:00
Willy Tarreau	a885f6dc65	MEDIUM: memory: improve pool_refill_alloc() to pass a refill count Till now this function would only allocate one entry at a time. But with dynamic buffers we'll like to allocate the number of missing entries to properly refill the pool. Let's modify it to take a minimum amount of available entries. This means that when we know we need at least a number of available entries, we can ask to allocate all of them at once. It also ensures that we don't move the pointers back and forth between the caller and the pool, and that we don't call pool_gc2() for each failed malloc. Instead, it's called only once and the malloc is only allowed to fail once.	2014-12-24 23:47:31 +01:00
Willy Tarreau	0262241e26	MINOR: memory: cut pool allocator in 3 layers pool_alloc2() used to pick the entry from the pool, fall back to pool_refill_alloc(), and to perform the poisonning itself, which pool_refill_alloc() was also doing. While this led to optimal code size, it imposes memory poisonning on the buffers as well, which is extremely slow on large buffers. This patch cuts the allocator in 3 layers : - a layer to pick the first entry from the pool without falling back to pool_refill_alloc() : pool_get_first() - a layer to allocate a dirty area by falling back to pool_refill_alloc() but never performing the poisonning : pool_alloc_dirty() - pool_alloc2() which calls the latter and optionally poisons the area No functional changes were made.	2014-12-24 23:47:31 +01:00
Willy Tarreau	4f31fc2f28	BUG/MEDIUM: compression: correctly report zlib_mem In zlib we track memory usage. The problem is that the way alloc_zlib() and free_zlib() account for memory is different, resulting in variations that can lead to negative zlib_mem being reported. The alloc() function uses the requested size while the free() function uses the pool size. The difference can happen when pools are shared with other pools of similar size. The net effect is that zlib_mem can be reported negative with a slowly decreasing count, and over the long term the limit will not be enforced anymore. The fix is simple : let's use the pool size in both cases, which is also the exact value when it comes to memory usage. This fix must be backported to 1.5.	2014-12-24 18:19:50 +01:00
Willy Tarreau	529c13933b	BUG/MAJOR: namespaces: conn->target is not necessarily a server create_server_socket() used to dereference objt_server(conn->target), but if the target is not a server (eg: a proxy) then it's NULL and we get a segfault. This can be reproduced with a proxy using "dispatch" with no server, even when namespaces are disabled, because that code is not #ifdef'd. The fix consists in first checking if the target is a server. This fix does not need to be backported, this is 1.6-only.	2014-12-24 13:47:55 +01:00
Willy Tarreau	57767b8032	BUG/MEDIUM: memory: fix freeing logic in pool_gc2() There's a long-standing bug in pool_gc2(). It tries to protect the pool against releasing of too many entries but the formula is wrong as it compares allocated to minavail instead of (allocated-used) to minavail. Under memory contention, it ends up releasing more than what is granted by minavail and causes trouble to the dynamic buffer allocator. This bug is in fact major by itself, but since minavail has never been used till now, there is no impact at least in mainline. A backport to 1.5 is desired anyway in case any future backport or out-of-tree patch relies on this.	2014-12-23 11:22:57 +01:00
Willy Tarreau	a69fc9f803	BUG/MAJOR: stream-int: properly check the memory allocation return In stream_int_register_handler(), we call si_alloc_appctx(si) but as a mistake, instead of checking the return value for a NULL, we test <si>. This bug was discovered under extreme memory contention (memory for only two buffers with 500 connections waiting) and after 3 million failed connections. While it was very hard to produce it, the fix is tagged major because in theory it could happen when haproxy runs with a very low "-m" setting preventing from allocating just the few bytes needed for an appctx. But most users will never be able to trigger it. The fix was confirmed to address the bug. This fix must be backported to 1.5.	2014-12-23 11:22:39 +01:00
Thierry FOURNIER	fe1ebcd2cf	BUG/MAJOR: ns: HAProxy segfault if the cli_conn is not from a network connection The path "MAJOR: namespace: add Linux network namespace support" doesn't permit to use internal data producer like a "peers synchronisation" system. The result is a segfault when the internal application starts. This patch fix the commit b3e54fe387c7c1ea750f39d3029672d640c499f9 It is introduced in 1.6dev version, it doesn't need to be backported.	2014-12-19 23:39:29 +01:00
Thierry FOURNIER	07e78c50b5	MINOR: map/acl/dumpstats: remove the "Done." message By convention, the HAProxy CLI doesn't return message if the opration is sucessfully done. The MAP and ACL returns the "Done." message, an its noise the output during big MAP or ACL injection.	2014-12-18 23:29:46 +01:00
Willy Tarreau	f6b7001338	BUG/MEDIUM: config: do not propagate processes between stopped processes Immo Goltz reported a case of segfault while parsing the config where we try to propagate processes across stopped frontends (those with a "disabled" statement). The fix is trivial. The workaround consists in commenting out these frontends, although not always easy. This fix must be backported to 1.5.	2014-12-18 14:03:31 +01:00
Willy Tarreau	8a95d8cd61	BUG/MINOR: config: fix typo in condition when propagating process binding propagate_processes() has a typo in a condition : if (!from->cap & PR_CAP_FE) return; The return is never taken because each proxy has at least one capability so !from->cap always evaluates to zero. Most of the time the caller already checks that <from> is a frontend. In the cases where it's not tested (use_backend, reqsetbe), the rules have been checked for the context to be a frontend as well, so in the end it had no nasty side effect. This should be backported to 1.5.	2014-12-18 14:03:31 +01:00
Godbach	d972203fbc	BUG/MINOR: parse: refer curproxy instead of proxy Since during parsing stage, curproxy always represents a proxy to be operated, it should be a mistake by referring proxy. Signed-off-by: Godbach <nylzhaowei@gmail.com>	2014-12-18 11:01:51 +01:00
Godbach	1f1fae6202	BUG/MINOR: http: fix typo: "401 Unauthorized" => "407 Unauthorized" 401 Unauthorized => 407 Unauthorized Signed-off-by: Godbach <nylzhaowei@gmail.com>	2014-12-17 17:05:49 +01:00
Godbach	d39ae7ddc9	CLEANUP: epoll: epoll_events should be allocated according to global.tune.maxpollevents Willy: commit f2e8ee2b introduced an optimization in the old speculative epoll code, which implemented its own event cache. It was needed to store that many events (it was bound to maxsock/4 btw). Now the event cache lives on its own and we don't need this anymore. And since events are allocated on the kernel side, we only need to allocate the events we want to return. As a result, absmaxevents will be not used anymore. Just remove the definition and the comment of it, replace it with global.tune.maxpollevents. It is also an optimization of memory usage for large amounts of sockets. Signed-off-by: Godbach <nylzhaowei@gmail.com>	2014-12-17 17:04:53 +01:00
Vincent Bernat	1228dc0e7a	BUG/MEDIUM: sample: fix random number upper-bound random() will generate a number between 0 and RAND_MAX. POSIX mandates RAND_MAX to be at least 32767. GNU libc uses (1<<31 - 1) as RAND_MAX. In smp_fetch_rand(), a reduction is done with a multiply and shift to avoid skewing the results. However, the shift was always 32 and hence the numbers were not distributed uniformly in the specified range. We fix that by dividing by RAND_MAX+1. gcc is smart enough to turn that into a shift: 0x000000000046ecc8 <+40>: shr $0x1f,%rax	2014-12-10 22:45:34 +01:00
Godbach	f2dd68d0e0	DOC: fix a few typos include/types/proto_http.h: hwen -> when include/types/server.h: SRV_ST_DOWN -> SRV_ST_STOPPED src/backend.c: prefer-current-server -> prefer-last-server Signed-off-by: Godbach <nylzhaowei@gmail.com>	2014-12-10 05:34:55 +01:00
Lukas Tribus	e4e30f7d52	BUILD: ssl: use OPENSSL_NO_OCSP to detect OCSP support Since commit 656c5fa7e859 ("BUILD: ssl: disable OCSP when using boringssl) the OCSP code is bypassed when OPENSSL_IS_BORINGSSL is defined. The correct thing to do here is to use OPENSSL_NO_OCSP instead, which is defined for this exact purpose in openssl/opensslfeatures.h. This makes haproxy forward compatible if boringssl ever introduces full OCSP support with the additional benefit that it links fine against a OCSP-disabled openssl. Signed-off-by: Lukas Tribus <luky-37@hotmail.com>	2014-12-09 20:49:22 +01:00
Willy Tarreau	f3d3482c98	BUG/MEDIUM: tcp-checks: disable quick-ack unless next rule is an expect Using "option tcp-checks" without any rule is different from not using it at all in that checks are sent with the TCP quick ack mode enabled, causing servers to log incoming port probes. This commit fixes this behaviour by disabling quick-ack on tcp-checks unless the next rule exists and is an expect. All combinations were tested and now the behaviour is as expected : basic port probes are now doing a SYN-SYN/ACK-RST sequence. This fix must be backported to 1.5.	2014-12-08 12:11:28 +01:00
Willy Tarreau	d2a49592fa	BUG/MEDIUM: tcp-check: don't rely on random memory contents If "option tcp-check" is used and no "tcp-check" rule is specified, we only look at rule->action which dereferences the proxy's memory and which can randomly match TCPCHK_ACT_CONNECT or whatever else, causing a check to fail. This bug is the result of an incorrect fix attempted in commit f621bea ("BUG/MINOR: tcpcheck connect wrong behavior"). This fix must be backported into 1.5.	2014-12-08 11:52:28 +01:00
Willy Tarreau	e7b9ed33ee	BUG/MINOR: tcp-check: don't condition data polling on check type tcp_check_main() would condition the polling for writes on check->type, but this is absurd given that check->type == PR_O2_TCPCHK_CHK since this is the only way we can get there! This patch removes this confusing test.	2014-12-08 11:28:18 +01:00
Cyril Bont�	9ede66b06d	MEDIUM: checks: provide environment variables to the external checks The external command accepted 4 arguments, some with the value "NOT_USED" when not applicable. In order to make the exernal command more generic, this patch also provides the values in environment variables. This allows to provide more information. Currently, the supported environment variables are : PATH, as previously provided. HAPROXY_PROXY_NAME, the backend name HAPROXY_PROXY_ID, the backend id HAPROXY_PROXY_ADDR, the first bind address if available (or empty) HAPROXY_PROXY_PORT, the first bind port if available (or empty) HAPROXY_SERVER_NAME, the server name HAPROXY_SERVER_ID, the server id HAPROXY_SERVER_ADDR, the server address HAPROXY_SERVER_PORT, the server port if available (or empty) HAPROXY_SERVER_MAXCONN, the server max connections HAPROXY_SERVER_CURCONN, the current number of connections on the server	2014-12-02 21:44:33 +01:00
Cyril Bont�	777be861c5	MINOR: checks: allow external checks in backend sections Previously, external checks required to find at least one listener in order to pass the <proxy_address> and <proxy_port> arguments to the external script. It prevented from declaring external checks in backend sections and haproxy rejected the configuration. The listener is now optional and values "NOT_USED" are passed if no listener is found. For instance, this is the case with a backend section. This is specific to the 1.6 branch.	2014-12-02 21:44:33 +01:00
Willy Tarreau	83f2592bcd	BUG/MEDIUM: payload: ensure that a request channel is available Denys Fedoryshchenko reported a segfault when using certain sample fetch functions in the "tcp-request connection" rulesets despite the warnings. This is because some tests for the existence of the channel were missing. The fetches which were fixed are : - req.ssl_hello_type - rep.ssl_hello_type - req.ssl_sni This fix must be backported to 1.5.	2014-11-26 13:32:22 +01:00
Willy Tarreau	4deaf39243	BUG/MEDIUM: patterns: previous fix was incomplete Dmitry Sivachenko <trtrmitya@gmail.com> reported that commit 315ec42 ("BUG/MEDIUM: pattern: don't load more than once a pattern list.") relies on an uninitialised variable in the stack. While it used to work fine during the tests, if the uninitialized variable is non-null, some patterns may be aggregated if loaded multiple times, resulting in slower processing, which was the original issue it tried to address. The fix needs to be backported to 1.5.	2014-11-26 13:17:03 +01:00
Willy Tarreau	3b24641745	BUG/MAJOR: sessions: unlink session from list on out of memory Since embryonic sessions were introduced in 1.5-dev12 with commit 2542b53 ("MAJOR: session: introduce embryonic sessions"), a major bug remained present. If haproxy cannot allocate memory during session_complete() (for example, no more buffers), it will not unlink the new session from the sessions list. This will cause memory corruptions if the memory area from the session is reused for anything else, and may also cause bogus output on "show sess" on the CLI. This fix must be backported to 1.5.	2014-11-25 22:09:05 +01:00
Emeric Brun	c9a0f6d023	MINOR: samples: add the word converter. word(<index>,<delimiters>) Extracts the nth word considering given delimiters from an input string. Indexes start at 1 and delimiters are a string formatted list of chars.	2014-11-25 14:48:39 +01:00
Emeric Brun	f399b0debf	MINOR: samples: adds the field converter. field(<index>,<delimiters>) Extracts the substring at the given index considering given delimiters from an input string. Indexes start at 1 and delimiters are a string formatted list of chars.	2014-11-24 17:44:02 +01:00
Emeric Brun	54c4ac8417	MINOR: samples: adds the bytes converter. bytes(<offset>[,<length>]) Extracts a some bytes from an input binary sample. The result is a binary sample starting at an offset (in bytes) of the original sample and optionnaly truncated at the given length.	2014-11-24 17:44:02 +01:00
Willy Tarreau	0f30d26dbf	MINOR: sample: add a few basic internal fetches (nbproc, proc, stopping) Sometimes, either for debugging or for logging we'd like to have a bit of information about the running process. Here are 3 new fetches for this : nbproc : integer Returns an integer value corresponding to the number of processes that were started (it equals the global "nbproc" setting). This is useful for logging and debugging purposes. proc : integer Returns an integer value corresponding to the position of the process calling the function, between 1 and global.nbproc. This is useful for logging and debugging purposes. stopping : boolean Returns TRUE if the process calling the function is currently stopping. This can be useful for logging, or for relaxing certain checks or helping close certain connections upon graceful shutdown.	2014-11-24 17:44:02 +01:00
Emeric Brun	4b9e80268e	BUG/MINOR: samples: fix unnecessary memcopy converting binary to string.	2014-11-24 17:44:02 +01:00
Willy Tarreau	42fb809cf4	BUG/MINOR: peers: the buffer size is global.tune.bufsize, not trash.size Currently this is harmless since trash.size is copied from global.tune.bufsize, but this may soon change when buffers become more dynamic. At least for consistency it should be backported to 1.5.	2014-11-24 15:40:57 +01:00
Thierry FOURNIER	315ec4217f	BUG/MEDIUM: pattern: don't load more than once a pattern list. A memory optimization can use the same pattern expression for many equal pattern list (same parse method, index method and index_smp method). The pattern expression is returned by "pattern_new_expr", but this function dont indicate if the returned pattern is already in use. So, the caller function reload the list of patterns in addition with the existing patterns. This behavior is not a problem with tree indexed pattern, but it grows the lists indexed patterns. This fix add a "reuse" flag in return of the function "pattern_new_expr". If the flag is set, I suppose that the patterns are already loaded. This fix must be backported into 1.5.	2014-11-24 15:40:16 +01:00
Willy Tarreau	5be2f35231	MAJOR: polling: centralize calls to I/O callbacks In order for HTTP/2 not to eat too much memory, we'll have to support on-the-fly buffer allocation, since most streams will have an empty request buffer at some point. Supporting allocation on the fly means being able to sleep inside I/O callbacks if a buffer is not available. Till now, the I/O callbacks were called from two locations : - when processing the cached events - when processing the polled events from the poller This change cleans up the design a bit further than what was started in 1.5. It now ensures that we never call any iocb from the poller itself and that instead, events learned by the poller are put into the cache. The benefit is important in terms of stability : we don't have to care anymore about the risk that new events are added into the poller while processing its events, and we're certain that updates are processed at a single location. To achieve this, we now modify all the fd_* functions so that instead of creating updates, they add/remove the fd to/from the cache depending on its state, and only create an update when the polling status reaches a state where it will have to change. Since the pollers make use of these functions to notify readiness (using fd_may_recv/fd_may_send), the cache is always up to date with the poller. Creating updates only when the polling status needs to change saves a significant amount of work for the pollers : a benchmark showed that on a typical TCP proxy test, the amount of updates per connection dropped from 11 to 1 on average. This also means that the update list is smaller and has more chances of not thrashing too many CPU cache lines. The first observed benefit is a net 2% performance gain on the connection rate. A second benefit is that when a connection is accepted, it's only when we're processing the cache, and the recv event is automatically added into the cache after the current one, resulting in this event to be processed immediately during the same loop. Previously we used to have a second run over the updates to detect if new events were added to catch them before waking up tasks. The next gain will be offered by the next steps on this subject consisting in implementing an I/O queue containing all cached events ordered by priority just like the run queue, and to be able to leave some events pending there as long as needed. That will allow us not to perform some FD processing if it's not the proper time for this (typically keep waiting for a buffer to be allocated if none is available for an recv()). And by only processing a small bunch of them, we'll allow priorities to take place even at the I/O level. As a result of this change, functions fd_alloc_or_release_cache_entry() and fd_process_polled_events() have disappeared, and the code dedicated to checking for new fd events after the callback during the poll() loop was removed as well. Despite the patch looking large, it's mostly a change of what function is falled upon fd_*() and almost nothing was added.	2014-11-21 20:37:32 +01:00
Willy Tarreau	5506e3f8b6	BUG/MINOR: stats: correctly set the request/response analysers When enabling stats, response analysers were set on the request analyser list, which 1) has no effect, and 2) means we don't have the response analysers properly set. In practice these response analysers are set when the connection to the server or applet is established so we don't need/must not set them here. Fortunately this bug had no impact since the flags are distinct, but it definitely is confusing. It should be backported to 1.5.	2014-11-21 17:53:08 +01:00
KOVACS Krisztian	b3e54fe387	MAJOR: namespace: add Linux network namespace support This patch makes it possible to create binds and servers in separate namespaces. This can be used to proxy between multiple completely independent virtual networks (with possibly overlapping IP addresses) and a non-namespace-aware proxy implementation that supports the proxy protocol (v2). The setup is something like this: net1 on VLAN 1 (namespace 1) -\ net2 on VLAN 2 (namespace 2) -- haproxy ==== proxy (namespace 0) net3 on VLAN 3 (namespace 3) -/ The proxy is configured to make server connections through haproxy and sending the expected source/target addresses to haproxy using the proxy protocol. The network namespace setup on the haproxy node is something like this: = 8< = $ cat setup.sh ip netns add 1 ip link add link eth1 type vlan id 1 ip link set eth1.1 netns 1 ip netns exec 1 ip addr add 192.168.91.2/24 dev eth1.1 ip netns exec 1 ip link set eth1.$id up ... = 8< = = 8< = $ cat haproxy.cfg frontend clients bind 127.0.0.1:50022 namespace 1 transparent default_backend scb backend server mode tcp server server1 192.168.122.4:2222 namespace 2 send-proxy-v2 = 8< = A bind line creates the listener in the specified namespace, and connections originating from that listener also have their network namespace set to that of the listener. A server line either forces the connection to be made in a specified namespace or may use the namespace from the client-side connection if that was set. For more documentation please read the documentation included in the patch itself. Signed-off-by: KOVACS Tamas <ktamas@balabit.com> Signed-off-by: Sarkozi Laszlo <laszlo.sarkozi@balabit.com> Signed-off-by: KOVACS Krisztian <hidden@balabit.com>	2014-11-21 07:51:57 +01:00
KOVACS Krisztian	efd3aa9341	BUG/MEDIUM: connection: sanitize PPv2 header length before parsing address information Previously, if hdr_v2->len was less than the length of the protocol specific address information we could have read after the end of the buffer and initialize the sockaddr structure with junk. Signed-off-by: KOVACS Krisztian <hidden@balabit.com> [WT: this is only tagged medium since proxy protocol is only used from trusted sources] This must be backported to 1.5.	2014-11-21 07:45:17 +01:00
Willy Tarreau	9654e57fac	BUG/MAJOR: frontend: initialize capture pointers earlier Denys Fedoryshchenko reported and diagnosed a nasty bug caused by TCP captures, introduced in late 1.5-dev by commit 18bf01e ("MEDIUM: tcp: add a new tcp-request capture directive"). The problem is that we're using the array of capture pointers initially designed for HTTP usage only, and that this array was only reset when starting to process an HTTP request. In a tcp-only frontend, the pointers are not reset, and if the capture pool is shared, we can very well point to whatever other memory location, resulting in random crashes when tcp-request content captures are processed. The fix simply consists in initializing these pointers when the pools are prepared. A workaround for existing versions consists in either disabling TCP captures in tcp-only frontends, or in forcing the frontends to work in HTTP mode. Thanks to Denys for the amount of testing and detailed reports. This fix must be backported to 1.5.	2014-11-18 18:53:43 +01:00
Willy Tarreau	743c128580	BUG/MINOR: config: don't inherit the default balance algorithm in frontends Tom Limoncelli from Stack Exchange reported a minor bug : the frontend inherits the LB parameters from the defaults sections. The impact is that if a "balance" directive uses any L7 parameter in the defaults sections and the frontend is in TCP mode, a warning is emitted about their incompatibility. The warning is harmless but a valid, sane config should never cause any warning to be reported. This fix should be backported into 1.5 and possibly 1.4.	2014-11-18 15:04:29 +01:00
Christian Ruppert	de898712a0	MEDIUM: regex: Use pcre_study always when PCRE is used, regardless of JIT pcre_study() has been around long before JIT has been added. It also seems to affect the performance in some cases (positive). Below I've attached some test restults. The test is based on http://sljit.sourceforge.net/regex_perf.html (see bottom). It has been modified to just test pcre_study vs. no pcre_study. Note: This test does not try to match specific header it's instead run over a larger text with more and less complex patterns to make the differences more clear. % ./runtest 'mark.txt' loaded. (Length: 19665221 bytes) ----------------- Regex: 'Twain' [pcre-nostudy] time: 14 ms (2388 matches) [pcre-study] time: 21 ms (2388 matches) ----------------- Regex: '^Twain' [pcre-nostudy] time: 109 ms (100 matches) [pcre-study] time: 109 ms (100 matches) ----------------- Regex: 'Twain$' [pcre-nostudy] time: 14 ms (127 matches) [pcre-study] time: 16 ms (127 matches) ----------------- Regex: 'Huck[a-zA-Z]+\|Finn[a-zA-Z]+' [pcre-nostudy] time: 695 ms (83 matches) [pcre-study] time: 26 ms (83 matches) ----------------- Regex: 'a[^x]{20}b' [pcre-nostudy] time: 90 ms (12495 matches) [pcre-study] time: 91 ms (12495 matches) ----------------- Regex: 'Tom\|Sawyer\|Huckleberry\|Finn' [pcre-nostudy] time: 1236 ms (3015 matches) [pcre-study] time: 34 ms (3015 matches) ----------------- Regex: '.{0,3}(Tom\|Sawyer\|Huckleberry\|Finn)' [pcre-nostudy] time: 5696 ms (3015 matches) [pcre-study] time: 5655 ms (3015 matches) ----------------- Regex: '[a-zA-Z]+ing' [pcre-nostudy] time: 1290 ms (95863 matches) [pcre-study] time: 1167 ms (95863 matches) ----------------- Regex: '^[a-zA-Z]{0,4}ing[^a-zA-Z]' [pcre-nostudy] time: 136 ms (4507 matches) [pcre-study] time: 134 ms (4507 matches) ----------------- Regex: '[a-zA-Z]+ing$' [pcre-nostudy] time: 1334 ms (5360 matches) [pcre-study] time: 1214 ms (5360 matches) ----------------- Regex: '^[a-zA-Z ]{5,}$' [pcre-nostudy] time: 198 ms (26236 matches) [pcre-study] time: 197 ms (26236 matches) ----------------- Regex: '^.{16,20}$' [pcre-nostudy] time: 173 ms (4902 matches) [pcre-study] time: 175 ms (4902 matches) ----------------- Regex: '([a-f](.[d-m].){0,2}[h-n]){2}' [pcre-nostudy] time: 1242 ms (68621 matches) [pcre-study] time: 690 ms (68621 matches) ----------------- Regex: '([A-Za-z]awyer\|[A-Za-z]inn)[^a-zA-Z]' [pcre-nostudy] time: 1215 ms (675 matches) [pcre-study] time: 952 ms (675 matches) ----------------- Regex: '"[^"]{0,30}[?!\.]"' [pcre-nostudy] time: 27 ms (5972 matches) [pcre-study] time: 28 ms (5972 matches) ----------------- Regex: 'Tom.{10,25}river\|river.{10,25}Tom' [pcre-nostudy] time: 705 ms (2 matches) [pcre-study] time: 68 ms (2 matches) In some cases it's more or less the same but when it's faster than by a huge margin. It always depends on the pattern, the string(s) to match against etc. Signed-off-by: Christian Ruppert <c.ruppert@babiel.com>	2014-11-18 13:26:18 +01:00

... 255 256 257 258 259 ...

16131 Commits