haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-20 06:01:23 +02:00

Author	SHA1	Message	Date
Willy Tarreau	34d4c3c13f	BUG/MINOR: http: abort request processing on filter failure Commit c600204 ("BUG/MEDIUM: regex: fix risk of buffer overrun in exp_replace()") added a control of failure on the response headers, but forgot to check for the error during request processing. So if the filters fail to apply, we could keep the request. It might cause some headers to silently fail to be added for example. Note that it's tagged MINOR because a standard configuration cannot make this case happen. The fix should be backported to 1.5 and 1.4 though.	2015-01-30 20:58:58 +01:00
Cyril Bont�	32602d2361	BUG/MINOR: checks: prevent http keep-alive with http-check expect S�bastien Rohaut reported that string negation in http-check expect didn't work as expected. The misbehaviour is caused by responses with HTTP keep-alive. When the condition is not met, haproxy awaits more data until the buffer is full or the connection is closed, resulting in a check timeout when "timeout check" is lower than the keep-alive timeout on the server side. In order to avoid the issue, when a "http-check expect" is used, haproxy will ask the server to disable keep-alive by automatically appending a "Connection: close" header to the request.	2015-01-30 00:43:34 +01:00
Willy Tarreau	aa435e7d7e	BUG/MINOR: http: fix incorrect header value offset in replace-hdr/replace-value The two http-req/http-resp actions "replace-hdr" and "replace-value" were expecting exactly one space after the colon, which is wrong. It was causing the first char not to be seen/modified when no space was present, and empty headers not to be modified either. Instead of using name->len+2, we must use ctx->val which points to the first character of the value even if there is no value. This fix must be backported into 1.5.	2015-01-29 14:01:34 +01:00
Willy Tarreau	474b96ad41	MEDIUM: init: continue to enforce SYSTEM_MAXCONN with auto settings if set Commit d025648 ("MAJOR: init: automatically set maxconn and/or maxsslconn when possible") resulted in a case where if enough memory is available, a maxconn value larger than SYSTEM_MAXCONN could be computed, resulting in possibly overflowing other systems resources (eg: kernel socket buffers, conntrack entries, etc). Let's bound any automatic maxconn to SYSTEM_MAXCONN if it is defined. Note that the value is set to DEFAULT_MAXCONN since SYSTEM_MAXCONN forces DEFAULT_MAXCONN, thus it is not an error.	2015-01-28 19:03:21 +01:00
Godbach	58048a2dc9	BUG/MINOR: parse: check the validity of size string in a more strict way If a stick table is defined as below: stick-table type ip size 50ka expire 300s HAProxy will stop parsing size after passing through "50k" and return the value directly. But such format string of size should not be valid. The patch checks the next character to report error if any. Signed-off-by: Godbach <nylzhaowei@gmail.com>	2015-01-28 11:23:11 +01:00
Willy Tarreau	9770787e70	MEDIUM: samples: provide basic arithmetic and bitwise operators This commit introduces a new category of converters. They are bitwise and arithmetic operators which support performing basic operations on integers. Some bitwise operations are supported (and, or, xor, cpl) and some arithmetic operations are supported (add, sub, mul, div, mod, neg). Some comparators are provided (odd, even, not, bool) which make it possible to report a match without having to write an ACL. The detailed list of new operators as they appear in the doc is : add(<value>) Adds <value> to the input value of type unsigned integer, and returns the result as an unsigned integer. and(<value>) Performs a bitwise "AND" between <value> and the input value of type unsigned integer, and returns the result as an unsigned integer. bool Returns a boolean TRUE if the input value of type unsigned integer is non-null, otherwise returns FALSE. Used in conjunction with and(), it can be used to report true/false for bit testing on input values (eg: verify the presence of a flag). cpl Takes the input value of type unsigned integer, applies a twos-complement (flips all bits) and returns the result as an unsigned integer. div(<value>) Divides the input value of type unsigned integer by <value>, and returns the result as an unsigned integer. If <value> is null, the largest unsigned integer is returned (typically 2^32-1). even Returns a boolean TRUE if the input value of type unsigned integer is even otherwise returns FALSE. It is functionally equivalent to "not,and(1),bool". mod(<value>) Divides the input value of type unsigned integer by <value>, and returns the remainder as an unsigned integer. If <value> is null, then zero is returned. mul(<value>) Multiplies the input value of type unsigned integer by <value>, and returns the product as an unsigned integer. In case of overflow, the higher bits are lost, leading to seemingly strange values. neg Takes the input value of type unsigned integer, computes the opposite value, and returns the remainder as an unsigned integer. 0 is identity. This operator is provided for reversed subtracts : in order to subtract the input from a constant, simply perform a "neg,add(value)". not Returns a boolean FALSE if the input value of type unsigned integer is non-null, otherwise returns TRUE. Used in conjunction with and(), it can be used to report true/false for bit testing on input values (eg: verify the absence of a flag). odd Returns a boolean TRUE if the input value of type unsigned integer is odd otherwise returns FALSE. It is functionally equivalent to "and(1),bool". or(<value>) Performs a bitwise "OR" between <value> and the input value of type unsigned integer, and returns the result as an unsigned integer. sub(<value>) Subtracts <value> from the input value of type unsigned integer, and returns the result as an unsigned integer. Note: in order to subtract the input from a constant, simply perform a "neg,add(value)". xor(<value>) Performs a bitwise "XOR" (exclusive OR) between <value> and the input value of type unsigned integer, and returns the result as an unsigned integer.	2015-01-27 15:41:13 +01:00
Cyril Bont�	3180f7b554	MINOR: ssl: load certificates in alphabetical order As reported by Rapha�l Enrici, certificates loaded from a directory are loaded in a non predictive order. If no certificate was first loaded from a file, it can result in different behaviours when haproxy is used in cluster. We can also imagine other cases which weren't met yet. Instead of using readdir(), we can use scandir() and sort files alphabetically. This will ensure a predictive behaviour. This patch should also be backported to 1.5.	2015-01-25 00:48:01 +01:00
Willy Tarreau	a0dc23f093	MEDIUM: http: implement http-request set-{method,path,query,uri} This commit implements the following new actions : - "set-method" rewrites the request method with the result of the evaluation of format string <fmt>. There should be very few valid reasons for having to do so as this is more likely to break something than to fix it. - "set-path" rewrites the request path with the result of the evaluation of format string <fmt>. The query string, if any, is left intact. If a scheme and authority is found before the path, they are left intact as well. If the request doesn't have a path ("*"), this one is replaced with the format. This can be used to prepend a directory component in front of a path for example. See also "set-query" and "set-uri". Example : # prepend the host name before the path http-request set-path /%[hdr(host)]%[path] - "set-query" rewrites the request's query string which appears after the first question mark ("?") with the result of the evaluation of format string <fmt>. The part prior to the question mark is left intact. If the request doesn't contain a question mark and the new value is not empty, then one is added at the end of the URI, followed by the new value. If a question mark was present, it will never be removed even if the value is empty. This can be used to add or remove parameters from the query string. See also "set-query" and "set-uri". Example : # replace "%3D" with "=" in the query string http-request set-query %[query,regsub(%3D,=,g)] - "set-uri" rewrites the request URI with the result of the evaluation of format string <fmt>. The scheme, authority, path and query string are all replaced at once. This can be used to rewrite hosts in front of proxies, or to perform complex modifications to the URI such as moving parts between the path and the query string. See also "set-path" and "set-query". All of them are handled by the same parser and the same exec function, which is why they're merged all together. For once, instead of adding even more entries to the huge switch/case, we used the new facility to register action keywords. A number of the existing ones should probably move there as well.	2015-01-23 20:27:41 +01:00
Willy Tarreau	d817e468bf	BUG/MINOR: sample: fix case sensitivity for the regsub converter Two commits ago in 7eda849 ("MEDIUM: samples: add a regsub converter to perform regex-based transformations"), I got caught for the second time with the inverted case sensitivity usage of regex_comp(). So by default it is case insensitive and passing the "i" flag makes it case sensitive. I forgot to recheck that case before committing the cleanup. No harm anyway, nobody had the time to use it.	2015-01-23 20:27:41 +01:00
Simon Horman	0766e441dd	MEDIUM/BUG: Only explicitly report "DOWN (agent)" if the agent health is zero Make check check used to report explicitly report "DOWN (agent)" slightly more restrictive such that it only triggers if the agent health is zero. This avoids the following problem. 1. Backend is started disabled, agent check is is enabled 2. Backend is stabled using set server vip/rip state ready 3. Health is marked as down using set server vip/rip health down At this point the http stats page will report "DOWN (agent)" but the backend being down has nothing to do with the agent check This problem appears to have been introduced by cf2924bc2537bb08c ("MEDIUM: stats: report down caused by agent prior to reporting up"). Note that "DOWN (agent)" may also be reported by a more generic conditional which immediately follows the code changed by this patch. Reported-by: Mark Brooks <mark@loadbalancer.org> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-01-23 16:47:41 +01:00
Simon Horman	1a23cf0dfb	BUG/MEDIUM: Do not set agent health to zero if server is disabled in config disable starts a server in the disabled state, however setting the health of an agent implies that the agent is disabled as well as the server. This is a problem because the state of the agent is not restored if the state of the server is subsequently updated leading to an unexpected state. For example, if a server is started disabled and then the server state is set to ready then without this change show stat indicates that the server is "DOWN (agent)" when it is expected that the server would be UP if its (non-agent) health check passes. Reported-by: Mark Brooks <mark@loadbalancer.org> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-01-23 16:47:41 +01:00
Willy Tarreau	7eda849dce	MEDIUM: samples: add a regsub converter to perform regex-based transformations We can now replace matching regex parts with a string, a la sed. Note that there are at least 3 different behaviours for existing sed implementations when matching 0-length strings. Here is the result of the following operation on each implementationt tested : echo 'xzxyz' \| sed -e 's/xy/A/g' GNU sed 4.2.1 => AzAzA Perl's sed 5.16.1 => AAzAAzA Busybox v1.11.2 sed => AzAz The psed behaviour was adopted because it causes the least exceptions in the code and seems logical from a certain perspective : - "x" matches xy => add "A" and skip "x" - "z" matches xy => add "A" and keep "z", not part of the match - "xy" matches xy => add "A" and skip "xy" - "z" matches xy => add "A" and keep "z", not part of the match - "" matches xy => add "A" and stop here Anyway, given the incompatibilities between implementations, it's unlikely that some processing will rely on this behaviour. There currently is one big limitation : the configuration parser makes it impossible to pass commas or closing parenthesis (or even closing brackets in log formats). But that's still quite usable to replace certain characters or character sequences. It will become more complete once the config parser is reworked.	2015-01-22 14:24:53 +01:00
Willy Tarreau	15a53a4384	MEDIUM: regex: add support for passing regex flags to regex_exec_match() This function (and its sister regex_exec_match2()) abstract the regex execution but make it impossible to pass flags to the regex engine. Currently we don't use them but we'll need to support REG_NOTBOL soon (to indicate that we're not at the beginning of a line). So let's add support for this flag and update the API accordingly.	2015-01-22 14:24:53 +01:00
Willy Tarreau	469477879c	MINOR: args: implement a new arg type for regex : ARGT_REG This one will be used when a regex is expected. It is automatically resolved after the parsing and compiled into a regex. Some optional flags are supported in the type-specific flags that should be set by the optional arg checker. One is used during the regex compilation : ARGF_REG_ICASE to ignore case.	2015-01-22 14:24:53 +01:00
Willy Tarreau	3d241e78a1	MEDIUM: args: use #define to specify the number of bits used by arg types and counts This is in order to add new types. This patch does not change anything else. Two remaining (harmless) occurrences of a count of 8 instead of 7 were fixed by this patch : empty_arg_list[] and the for() loop counting args.	2015-01-22 14:24:53 +01:00
Willy Tarreau	8560328211	BUG/MEDIUM: http: make http-request set-header compute the string before removal The way http-request/response set-header works is stupid. For a naive reuse of the del-header code, it removes all occurrences of the header to be set before computing the new format string. This makes it almost unusable because it is not possible to append values to an existing header without first copying them to a dummy header, performing the copy back and removing the dummy header. Instead, let's share the same code as add-header and perform the optional removal after the string is computed. That way it becomes possible to write things like : http-request set-header X-Forwarded-For %[hdr(X-Forwarded-For)],%[src] Note that this change is not expected to have any undesirable impact on existing configs since if they rely on the bogus behaviour, they don't work as they always retrieve an empty string. This fix must be backported to 1.5 to stop the spreadth of ugly configs.	2015-01-21 20:45:00 +01:00
Willy Tarreau	53c250e165	BUG/MINOR: args: add missing entry for ARGT_MAP in arg_type_names This type is currently not used in an argument so it's harmles. But better correctly fill the name.	2015-01-21 16:06:53 +01:00
Willy Tarreau	324f07f6dd	MEDIUM: backend: add the crc32 hash algorithm for load balancing Since we have it available, let's make it usable for load balancing, it comes at no cost except 3 lines of documentation.	2015-01-20 19:48:14 +01:00
Willy Tarreau	8059977d3e	MINOR: samples: provide a "crc32" converter This converter hashes a binary input sample into an unsigned 32-bit quantity using the CRC32 hash function. Optionally, it is possible to apply a full avalanche hash function to the output if the optional <avalanche> argument equals 1. This converter uses the same functions as used by the various hash- based load balancing algorithms, so it will provide exactly the same results. It is provided for compatibility with other software which want a CRC32 to be computed on some input keys, so it follows the most common implementation as found in Ethernet, Gzip, PNG, etc... It is slower than the other algorithms but may provide a better or at least less predictable distribution.	2015-01-20 19:48:08 +01:00
Willy Tarreau	c829ee48c7	MINOR: hash: add new function hash_crc32 This function will be used to perform CRC32 computations. This one wa loosely inspired from crc32b found here, and focuses on size and speed at the same time : http://www.hackersdelight.org/hdcodetxt/crc.c.txt Much faster table-based versions exist but are pointless for our usage here, this hash already sustains gigabit speed which is far faster than what we'd ever need. Better preserve the CPU's cache instead.	2015-01-20 19:48:05 +01:00
Willy Tarreau	49ad95cc8e	MINOR: http: add a new fetch "query" to extract the request's query string This fetch extracts the request's query string, which starts after the first question mark. If no question mark is present, this fetch returns nothing. If a question mark is present but nothing follows, it returns an empty string. This means it's possible to easily know whether a query string is present using the "found" matching method. This fetch is the completemnt of "path" which stops before the question mark.	2015-01-20 19:47:47 +01:00
Willy Tarreau	d025648f7c	MAJOR: init: automatically set maxconn and/or maxsslconn when possible If a memory size limit is enforced using "-n" on the command line and one or both of maxconn / maxsslconn are not set, instead of using the build-time values, haproxy now computes the number of sessions that can be allocated depending on a number of parameters among which : - global.maxconn (if set) - global.maxsslconn (if set) - maxzlibmem - tune.ssl.cachesize - presence of SSL in at least one frontend (bind lines) - presence of SSL in at least one backend (server lines) - tune.bufsize - tune.cookie_len The purpose is to ensure that not haproxy will not run out of memory when maxing out all parameters. If neither maxconn nor maxsslconn are used, it will consider that 100% of the sessions involve SSL on sides where it's supported. That means that it will typically optimize maxconn for SSL offloading or SSL bridging on all connections. This generally means that the simple act of enabling SSL in a frontend or in a backend will significantly reduce the global maxconn but in exchange of that, it will guarantee that it will not fail. All metrics may be enforced using #defines to accomodate variations in SSL libraries or various allocation sizes.	2015-01-15 21:45:22 +01:00
Willy Tarreau	d92aa5c44a	MINOR: global: report information about the cost of SSL connections An SSL connection takes some memory when it exists and during handshakes. We measured up to 16kB for an established endpoint, and up to 76 extra kB during a handshake. The SSL layer stores these values into the global struct during initialization. If other SSL libs are used, it's easy to change these values. Anyway they'll only be used as gross estimates in order to guess the max number of SSL conns that can be established when memory is constrained and the limit is not set.	2015-01-15 21:34:39 +01:00
Willy Tarreau	fce03113fa	MINOR: global: always export some SSL-specific metrics We'll need to know the number of SSL connections, their use and their cost soon. In order to avoid getting tons of ifdefs everywhere, always export SSL information in the global section. We add two flags to know whether or not SSL is used in a frontend and in a backend.	2015-01-15 21:32:40 +01:00
Willy Tarreau	3ca1a883f9	MINOR: tools: add new round_2dig() function to round integers This function rounds down an integer to the closest value having only 2 significant digits.	2015-01-15 19:02:27 +01:00
Willy Tarreau	8c97ab5eb2	BUG/MAJOR: log: don't try to emit a log if no logger is set send_log() calls update_hdr() to build a log header. It may happen that no logger is defined at all but that we try to send a log anyway (eg: upon startup). This results in a segfault when building the log header because logline was never allocated. This bug was revealed by the recent log-tag changes because the logline is dereferenced after the call to snprintf(). So in 1.5 on most platforms it has no impact because snprintf() will ignore NULL, but not necessarily on all platforms. The fix needs to be backported to 1.5.	2015-01-15 16:29:53 +01:00
Willy Tarreau	319f745ba0	MINOR: channel: rename bi_erase() to channel_truncate() It applies to the channel and it doesn't erase outgoing data, only pending unread data, which is strictly equivalent to what recv() does with MSG_TRUNC, so that new name is more accurate and intuitive.	2015-01-14 20:32:59 +01:00
Willy Tarreau	b5051f8742	MINOR: channel: rename bi_avail() to channel_recv_max() This name more accurately reminds that it applies to a channel and not to a buffer, and that what is returned may be used as a max number of bytes to pass to recv().	2015-01-14 20:26:54 +01:00
Willy Tarreau	3f5096ddf2	MINOR: channel: rename buffer_max_len() to channel_recv_limit() Buffer_max_len() is ambiguous and misleading since it considers the channel. The new name more accurately designates the size limit for received data.	2015-01-14 20:21:43 +01:00
Willy Tarreau	3889fffe92	MINOR: channel: rename channel_full() to !channel_may_recv() This function's name was poorly chosen and is confusing to the point of being suspiciously used at some places. The operations it does always consider the ability to forward pending input data before receiving new data. This is not obvious at all, especially at some places where it was used when consuming outgoing data to know if the buffer has any chance to ever get the missing data. The code needs to be re-audited with that in mind. Care must be taken with existing code since the polarity of the function was switched with the renaming.	2015-01-14 18:41:33 +01:00
Willy Tarreau	ba0902ede4	CLEANUP: channel: rename channel_reserved -> channel_is_rewritable channel_reserved is confusingly named. It is used to know whether or not the rewrite area is left intact for situations where we want to ensure we can use it before proceeding. Let's rename it to fix this confusion.	2015-01-14 18:41:33 +01:00
Willy Tarreau	7c1c217426	BUG/MEDIUM: http: fix header removal when previous header ends with pure LF In 1.4-dev7, a header removal mechanism was introduced with commit 68085d8 ("[MINOR] http: add http_remove_header2() to remove a header value."). Due to a typo in the function, the beginning of the headers gets desynchronized if the header preceeding the deleted one ends with an LF/CRLF combination different form the one of the removed header. The reason is that while rewinding the pointer, we go back by a number of bytes taking into account the LF/CRLF status of the removed header instead of the previous one. The case where it fails is in http-request del-header/set-header where the multiple occurrences of a header are present and their LF/CRLF ending differs from the preceeding header. The loop then stops because no more headers are found given that the names and length do not match. Another point to take into consideration is that removing headers using a loop of http_find_header2() and this function is inefficient since we remove values one at a time while it could be simpler and faster to remove full header lines. This is something that should be addressed separately. This fix must be backported to 1.5 and 1.4. Note that http-send-name-header relies on this function as well so it could be possible that some of the issues encountered with it in 1.4 come from this bug.	2015-01-07 17:23:50 +01:00
Willy Tarreau	094af4e16e	MINOR: logs: add a new per-proxy "log-tag" directive This is equivalent to what was done in commit 48936af ("[MINOR] log: ability to override the syslog tag") but this time instead of doing this globally, it does it per proxy. The purpose is to be able to use a separate log tag for various proxies (eg: make it easier to route log messages depending on the customer).	2015-01-07 15:03:42 +01:00
Cyril Bont�	f607d81d09	BUG/MEDIUM: backend: correctly detect the domain when use_domain_only is used balance hdr(<name>) provides on option 'use_domain_only' to match only the domain part in a header (designed for the Host header). Olivier Fredj reported that the hashes were not the same for 'subdomain.domain.tld' and 'domain.tld'. This is because the pointer was rewinded one step to far, resulting in a hash calculated against wrong values : - '.domai' for 'subdomain.domain.tld' - ' domai' for 'domain.tld' (beginning with the space in the header line) Another special case is when no dot can be found in the header : the hash will be calculated against an empty string. The patch addresses both cases : 'domain' will be used to compute the hash for 'subdomain.domain.tld', 'domain.tld' and 'domain' (using the whole header value for the last case). The fix must be backported to haproxy 1.5 and 1.4.	2015-01-04 19:35:04 +01:00
Willy Tarreau	3c23a85550	CLEANUP: session: remove session_from_task() Since commit 3dd6a25 ("MINOR: stream-int: retrieve session pointer from stream-int"), we can get the session from the task, so let's get rid of this less obvious function.	2014-12-28 12:19:57 +01:00
Cyril Bont�	ac92a065d7	MINOR: checks: update dynamic environment variables in external checks commit 9ede66b0 introduced an environment variable (HAPROXY_SERVER_CURCONN) that was supposed to be dynamically updated, but it was set only once, during its initialization. Most of the code provided in this previous patch has been rewritten in order to easily update the environment variables without reallocating memory during each check. Now, HAPROXY_SERVER_CURCONN will contain the current number of connections on the server at the time of the check.	2014-12-28 01:22:56 +01:00
Willy Tarreau	56efc4896b	OPTIM: stream-int: try to send pending spliced data This is the equivalent of eb9fd51 ("OPTIM: stream_sock: reduce the amount of in-flight spliced data") whose purpose is to try to immediately send spliced data if available.	2014-12-24 23:47:33 +01:00
Willy Tarreau	9b20c55562	MEDIUM: stream-int: support splicing from applets If we want to splice from applets, we must check the pipe before clearing SI_FL_WAIT_ROOM.	2014-12-24 23:47:33 +01:00
Willy Tarreau	b034b2598d	MEDIUM: channel: implement a zero-copy buffer transfer bi_swpbuf() swaps the buffer passed in argument with the one attached to the channel, but only if this last one is empty. The idea is to avoid a copy when buffers can simply be swapped.	2014-12-24 23:47:33 +01:00
Willy Tarreau	33cb065348	MINOR: config: implement global setting tune.buffers.limit This setting is used to limit memory usage without causing the alloc failures caused by "-m". Unexpectedly, tests have shown a performance boost of up to about 18% on HTTP traffic when limiting the number of buffers to about 10% of the amount of concurrent connections. tune.buffers.limit <number> Sets a hard limit on the number of buffers which may be allocated per process. The default value is zero which means unlimited. The minimum non-zero value will always be greater than "tune.buffers.reserve" and should ideally always be about twice as large. Forcing this value can be particularly useful to limit the amount of memory a process may take, while retaining a sane behaviour. When this limit is reached, sessions which need a buffer wait for another one to be released by another session. Since buffers are dynamically allocated and released, the waiting time is very short and not perceptible provided that limits remain reasonable. In fact sometimes reducing the limit may even increase performance by increasing the CPU cache's efficiency. Tests have shown good results on average HTTP traffic with a limit to 1/10 of the expected global maxconn setting, which also significantly reduces memory usage. The memory savings come from the fact that a number of connections will not allocate 2*tune.bufsize. It is best not to touch this value unless advised to do so by an haproxy core developer.	2014-12-24 23:47:33 +01:00
Willy Tarreau	1058ae73f1	MINOR: config: implement global setting tune.buffers.reserve Used in conjunction with the dynamic buffer allocator. tune.buffers.reserve <number> Sets the number of buffers which are pre-allocated and reserved for use only during memory shortage conditions resulting in failed memory allocations. The minimum value is 2 and is also the default. There is no reason a user would want to change this value, it's mostly aimed at haproxy core developers.	2014-12-24 23:47:33 +01:00
Willy Tarreau	a24adf0795	MAJOR: session: only wake up as many sessions as available buffers permit We've already experimented with three wake up algorithms when releasing buffers : the first naive one used to wake up far too many sessions, causing many of them not to get any buffer. The second approach which was still in use prior to this patch consisted in waking up either 1 or 2 sessions depending on the number of FDs we had released. And this was still inaccurate. The third one tried to cover the accuracy issues of the second and took into consideration the number of FDs the sessions would be willing to use, but most of the time we ended up waking up too many of them for nothing, or deadlocking by lack of buffers. This patch completely removes the need to allocate two buffers at once. Instead it splits allocations into critical and non-critical ones and implements a reserve in the pool for this. The deadlock situation happens when all buffers are be allocated for requests pending in a maxconn-limited server queue, because then there's no more way to allocate buffers for responses, and these responses are critical to release the servers's connection in order to release the pending requests. In fact maxconn on a server creates a dependence between sessions and particularly between oldest session's responses and latest session's requests. Thus, it is mandatory to get a free buffer for a response in order to release a server connection which will permit to release a request buffer. Since we definitely have non-symmetrical buffers, we need to implement this logic in the buffer allocation mechanism. What this commit does is implement a reserve of buffers which can only be allocated for responses and that will never be allocated for requests. This is made possible by the requester indicating how much margin it wants to leave after the allocation succeeds. Thus it is a cooperative allocation mechanism : the requester (process_session() in general) prefers not to get a buffer in order to respect other's need for response buffers. The session management code always knows if a buffer will be used for requests or responses, so that is not difficult : - either there's an applet on the initiator side and we really need the request buffer (since currently the applet is called in the context of the session) - or we have a connection and we really need the response buffer (in order to support building and sending an error message back) This reserve ensures that we don't take all allocatable buffers for requests waiting in a queue. The downside is that all the extra buffers are really allocated to ensure they can be allocated. But with small values it is not an issue. With this change, we don't observe any more deadlocks even when running with maxconn 1 on a server under severely constrained memory conditions. The code becomes a bit tricky, it relies on the scheduler's run queue to estimate how many sessions are already expected to run so that it doesn't wake up everyone with too few resources. A better solution would probably consist in having two queues, one for urgent requests and one for normal requests. A failed allocation for a session dealing with an error, a connection event, or the need for a response (or request when there's an applet on the left) would go to the urgent request queue, while other requests would go to the other queue. Urgent requests would be served from 1 entry in the pool, while the regular ones would be served only according to the reserve. Despite not yet having this, it works remarkably well. This mechanism is quite efficient, we don't perform too many wake up calls anymore. For 1 million sessions elapsed during massive memory contention, we observe about 4.5M calls to process_session() compared to 4.0M without memory constraints. Previously we used to observe up to 16M calls, which rougly means 12M failures. During a test run under high memory constraints (limit enforced to 27 MB instead of the 58 MB normally needed), performance used to drop by 53% prior to this patch. Now with this patch instead it increases by about 1.5%. The best effect of this change is that by limiting the memory usage to about 2/3 to 3/4 of what is needed by default, it's possible to increase performance by up to about 18% mainly due to the fact that pools are reused more often and remain hot in the CPU cache (observed on regular HTTP traffic with 20k objects, buffers.limit = maxconn/10, buffers.reserve = limit/2). Below is an example of scenario which used to cause a deadlock previously : - connection is received - two buffers are allocated in process_session() then released - one is allocated when receiving an HTTP request - the second buffer is allocated then released in process_session() for request parsing then connection establishment. - poll() says we can send, so the request buffer is sent and released - process session gets notified that the connection is now established and allocates two buffers then releases them - all other sessions do the same till one cannot get the request buffer without hitting the margin - and now the server responds. stream_interface allocates the response buffer and manages to get it since it's higher priority being for a response. - but process_session() cannot allocate the request buffer anymore => We could end up with all buffers used by responses so that none may be allocated for a request in process_session(). When the applet processing leaves the session context, the test will have to be changed so that we always allocate a response buffer regardless of the left side (eg: H2->H1 gateway). A final improvement would consists in being able to only retry the failed I/O operation without waking up a task, but to date all experiments to achieve this have proven not to be reliable enough.	2014-12-24 23:47:33 +01:00
Willy Tarreau	9841019e42	MINOR: stats: report a "waiting" flags for sessions This flag indicates if the session is still waiting for some memory.	2014-12-24 23:47:33 +01:00
Willy Tarreau	10fc09e872	MAJOR: session: only allocate buffers when needed A session doesn't need buffers all the time, especially when they're empty. With this patch, we don't allocate buffers anymore when the session is initialized, we only allocate them in two cases : - during process_session() - during I/O operations During process_session(), we try hard to allocate both buffers at once so that we know for sure that a started operation can complete. Indeed, a previous version of this patch used to allocate one buffer at a time, but it can result in a deadlock when all buffers are allocated for requests for example, and there's no buffer left to emit error responses. Here, if any of the buffers cannot be allocated, the whole operation is cancelled and the session is added at the tail of the buffer wait queue. At the end of process_session(), a call to session_release_buffers() is done so that we can offer unused buffers to other sessions waiting for them. For I/O operations, we only need to allocate a buffer on the Rx path. For this, we only allocate a single buffer but ensure that at least two are available to avoid the deadlock situation. In case buffers are not available, SI_FL_WAIT_ROOM is set on the stream interface and the session is queued. Unused buffers resulting either from a successful send() or from an unused read buffer are offered to pending sessions during the ->wake() callback.	2014-12-24 23:47:33 +01:00
Willy Tarreau	bf883e0aa7	MAJOR: session: implement a wait-queue for sessions who need a buffer When a session_alloc_buffers() fails to allocate one or two buffers, it subscribes the session to buffer_wq, and waits for another session to release buffers. It's then removed from the queue and woken up with TASK_WAKE_RES, and can attempt its allocation again. We decide to try to wake as many waiters as we release buffers so that if we release 2 and two waiters need only once, they both have their chance. We must never come to the situation where we don't wake enough tasks up. It's common to release buffers after the completion of an I/O callback, which can happen even if the I/O could not be performed due to half a failure on memory allocation. In this situation, we don't want to move out of the wait queue the session that was just added, otherwise it will never get any buffer. Thus, we only force ourselves out of the queue when freeing the session. Note: at the moment, since session_alloc_buffers() is not used, no task is subscribed to the wait queue.	2014-12-24 23:47:33 +01:00
Willy Tarreau	656859d478	MEDIUM: session: implement a basic atomic buffer allocator This patch introduces session_alloc_recv_buffer(), session_alloc_buffers() and session_release_buffers() whose purpose will be to allocate missing buffers and release unneeded ones around the process_session() and during I/O operations. I/O callbacks only need a single buffer for recv operations and none for send. However we still want to ensure that we don't pick the last buffer. That's what session_alloc_recv_buffer() is for. This allocator is atomic in that it always ensures we can get 2 buffers or fails. Here, if any of the buffers is not ready and cannot be allocated, the operation is cancelled. The purpose is to guarantee that we don't enter into the deadlock where all buffers are allocated by the same size of all sessions. A queue will have to be implemented for failed allocations. For now they're just reported as failures.	2014-12-24 23:47:32 +01:00
Willy Tarreau	909e267be0	MINOR: session: group buffer allocations together We'll soon want to release buffers together upon failure so we need to allocate them after the channels. Let's change this now. There's no impact on the behaviour, only the error path is unrolled slightly differently. The same was done in peers.	2014-12-24 23:47:32 +01:00
Willy Tarreau	f2f7d6b27b	MEDIUM: buffer: add a new buf_wanted dummy buffer to report failed allocations Doing so ensures that even when no memory is available, we leave the channel in a sane condition. There's a special case in proto_http.c regarding the compression, we simply pre-allocate the tmpbuf to point to the dummy buffer. Not reusing &buf_empty for this allows the rest of the code to differenciate an empty buffer that's not used from an empty buffer that results from a failed allocation which has the same semantics as a buffer full.	2014-12-24 23:47:32 +01:00
Willy Tarreau	2a4b54359b	MEDIUM: buffer: always assign a dummy empty buffer to channels Channels are now created with a valid pointer to a buffer before the buffer is allocated. This buffer is a global one called "buf_empty" and of size zero. Thus it prevents any activity from being performed on the buffer and still ensures that chn->buf may always be dereferenced. b_free() also resets the buffer to &buf_empty, and was split into b_drop() which does not reset the buffer.	2014-12-24 23:47:32 +01:00
Willy Tarreau	7dfca9daec	MINOR: buffer: only use b_free to release buffers We don't call pool_free2(pool2_buffers) anymore, we only call b_free() to do the job. This ensures that we can start to centralize the releasing of buffers.	2014-12-24 23:47:32 +01:00

... 52 53 54 55 56 ...

6022 Commits