Commit Graph

5065 Commits

Author SHA1 Message Date
Willy Tarreau
bf883e0aa7 MAJOR: session: implement a wait-queue for sessions who need a buffer
When a session_alloc_buffers() fails to allocate one or two buffers,
it subscribes the session to buffer_wq, and waits for another session
to release buffers. It's then removed from the queue and woken up with
TASK_WAKE_RES, and can attempt its allocation again.

We decide to try to wake as many waiters as we release buffers so
that if we release 2 and two waiters need only once, they both have
their chance. We must never come to the situation where we don't wake
enough tasks up.

It's common to release buffers after the completion of an I/O callback,
which can happen even if the I/O could not be performed due to half a
failure on memory allocation. In this situation, we don't want to move
out of the wait queue the session that was just added, otherwise it
will never get any buffer. Thus, we only force ourselves out of the
queue when freeing the session.

Note: at the moment, since session_alloc_buffers() is not used, no task
is subscribed to the wait queue.
2014-12-24 23:47:33 +01:00
Willy Tarreau
656859d478 MEDIUM: session: implement a basic atomic buffer allocator
This patch introduces session_alloc_recv_buffer(), session_alloc_buffers()
and session_release_buffers() whose purpose will be to allocate missing
buffers and release unneeded ones around the process_session() and during
I/O operations.

I/O callbacks only need a single buffer for recv operations and none
for send. However we still want to ensure that we don't pick the last
buffer. That's what session_alloc_recv_buffer() is for.

This allocator is atomic in that it always ensures we can get 2 buffers
or fails. Here, if any of the buffers is not ready and cannot be
allocated, the operation is cancelled. The purpose is to guarantee that
we don't enter into the deadlock where all buffers are allocated by the
same size of all sessions.

A queue will have to be implemented for failed allocations. For now
they're just reported as failures.
2014-12-24 23:47:32 +01:00
Willy Tarreau
f4718e8ec0 MEDIUM: buffer: implement b_alloc_margin()
This function is used to allocate a buffer and ensure that we leave
some margin after it in the pool. The function is not obvious. While
we allocate only one buffer, we want to ensure that at least two remain
available after our allocation. The purpose is to ensure we'll never
enter a deadlock where all sessions allocate exactly one buffer, and
none of them will be able to allocate the second buffer needed to build
a response in order to release the first one.

We also take care of remaining fast in the the fast path by first
checking whether or not there is enough margin, in which case we only
rely on b_alloc_fast() which is guaranteed to succeed. Otherwise we
take the slow path using pool_refill_alloc().
2014-12-24 23:47:32 +01:00
Willy Tarreau
620bd6c88e MINOR: buffer: implement b_alloc_fast()
This function allocates a buffer and replaces *buf with this buffer. If
no memory is available, &buf_wanted is used instead. No control is made
to check if *buf already pointed to another buffer. The allocated buffer
is returned, or NULL in case no memory is available. The difference with
b_alloc() is that this function only picks from the pool and never calls
malloc(), so it can fail even if some memory is available. It is the
caller's job to refill the buffer pool if needed.
2014-12-24 23:47:32 +01:00
Willy Tarreau
909e267be0 MINOR: session: group buffer allocations together
We'll soon want to release buffers together upon failure so we need to
allocate them after the channels. Let's change this now. There's no
impact on the behaviour, only the error path is unrolled slightly
differently. The same was done in peers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
4428a29e52 MEDIUM: channel: do not report full when buf_empty is present on a channel
Till now we'd consider a buffer full even if it had size==0 due to pointing
to buf.size. Now we change this : if buf_wanted is present, it means that we
have already tried to allocate a buffer but failed. Thus the buffer must be
considered full so that we stop trying to poll for reads on it. Otherwise if
it's empty, it's buf_empty and we report !full since we may allocate it on
the fly.
2014-12-24 23:47:32 +01:00
Willy Tarreau
f2f7d6b27b MEDIUM: buffer: add a new buf_wanted dummy buffer to report failed allocations
Doing so ensures that even when no memory is available, we leave the
channel in a sane condition. There's a special case in proto_http.c
regarding the compression, we simply pre-allocate the tmpbuf to point
to the dummy buffer. Not reusing &buf_empty for this allows the rest
of the code to differenciate an empty buffer that's not used from an
empty buffer that results from a failed allocation which has the same
semantics as a buffer full.
2014-12-24 23:47:32 +01:00
Willy Tarreau
2a4b54359b MEDIUM: buffer: always assign a dummy empty buffer to channels
Channels are now created with a valid pointer to a buffer before the
buffer is allocated. This buffer is a global one called "buf_empty" and
of size zero. Thus it prevents any activity from being performed on
the buffer and still ensures that chn->buf may always be dereferenced.
b_free() also resets the buffer to &buf_empty, and was split into
b_drop() which does not reset the buffer.
2014-12-24 23:47:32 +01:00
Willy Tarreau
7dfca9daec MINOR: buffer: only use b_free to release buffers
We don't call pool_free2(pool2_buffers) anymore, we only call b_free()
to do the job. This ensures that we can start to centralize the releasing
of buffers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
696a2910a0 MINOR: buffer: move buffer initialization after channel initialization
It's not clean to initialize the buffer before the channel since it
dereferences one pointer in the channel. Also we'll want to let the
channel pre-initialize the buffer, so let's ensure that the channel
is always initialized prior to the buffers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
e583ea583a MEDIUM: buffer: use b_alloc() to allocate and initialize a buffer
b_alloc() now allocates a buffer and initializes it to the size specified
in the pool minus the size of the struct buffer itself. This ensures that
callers do not need to care about buffer details anymore. Also this never
applies memory poisonning, which is slow and useless on buffers.
2014-12-24 23:47:32 +01:00
Willy Tarreau
474cf54a97 MINOR: buffer: reset a buffer in b_reset() and not channel_init()
We'll soon need to be able to switch buffers without touching the
channel, so let's move buffer initialization out of channel_init().
We had the same in compressoin.c.
2014-12-24 23:47:31 +01:00
Willy Tarreau
3dd6a25323 MINOR: stream-int: retrieve session pointer from stream-int
sess_from_si() does this via the owner (struct task). It works because
all stream ints belong to a task nowadays.
2014-12-24 23:47:31 +01:00
Willy Tarreau
a885f6dc65 MEDIUM: memory: improve pool_refill_alloc() to pass a refill count
Till now this function would only allocate one entry at a time. But with
dynamic buffers we'll like to allocate the number of missing entries to
properly refill the pool.

Let's modify it to take a minimum amount of available entries. This means
that when we know we need at least a number of available entries, we can
ask to allocate all of them at once. It also ensures that we don't move
the pointers back and forth between the caller and the pool, and that we
don't call pool_gc2() for each failed malloc. Instead, it's called only
once and the malloc is only allowed to fail once.
2014-12-24 23:47:31 +01:00
Willy Tarreau
0262241e26 MINOR: memory: cut pool allocator in 3 layers
pool_alloc2() used to pick the entry from the pool, fall back to
pool_refill_alloc(), and to perform the poisonning itself, which
pool_refill_alloc() was also doing. While this led to optimal
code size, it imposes memory poisonning on the buffers as well,
which is extremely slow on large buffers.

This patch cuts the allocator in 3 layers :
  - a layer to pick the first entry from the pool without falling back to
    pool_refill_alloc() : pool_get_first()
  - a layer to allocate a dirty area by falling back to pool_refill_alloc()
    but never performing the poisonning : pool_alloc_dirty()
  - pool_alloc2() which calls the latter and optionally poisons the area

No functional changes were made.
2014-12-24 23:47:31 +01:00
Willy Tarreau
e430e77dfd CLEANUP: memory: replace macros pool_alloc2/pool_free2 with functions
Using inline functions here makes the code more readable and reduces its
size by about 2 kB.
2014-12-24 23:47:31 +01:00
Willy Tarreau
62405a2155 CLEANUP: memory: remove dead code
The very old pool managment code has not been used for the last 7 years
and is still polluting the file. Get rid of it now.
2014-12-24 23:47:31 +01:00
Willy Tarreau
3dd717cd5d CLEANUP: lists: remove dead code
Remove the code dealing with the old dual-linked lists imported from
librt that has remained unused for the last 8 years. Now everything
uses the linux-like circular lists instead.
2014-12-24 23:47:31 +01:00
Willy Tarreau
4f31fc2f28 BUG/MEDIUM: compression: correctly report zlib_mem
In zlib we track memory usage. The problem is that the way alloc_zlib()
and free_zlib() account for memory is different, resulting in variations
that can lead to negative zlib_mem being reported. The alloc() function
uses the requested size while the free() function uses the pool size. The
difference can happen when pools are shared with other pools of similar
size. The net effect is that zlib_mem can be reported negative with a
slowly decreasing count, and over the long term the limit will not be
enforced anymore.

The fix is simple : let's use the pool size in both cases, which is also
the exact value when it comes to memory usage.

This fix must be backported to 1.5.
2014-12-24 18:19:50 +01:00
Willy Tarreau
529c13933b BUG/MAJOR: namespaces: conn->target is not necessarily a server
create_server_socket() used to dereference objt_server(conn->target),
but if the target is not a server (eg: a proxy) then it's NULL and we
get a segfault. This can be reproduced with a proxy using "dispatch"
with no server, even when namespaces are disabled, because that code
is not #ifdef'd. The fix consists in first checking if the target is
a server.

This fix does not need to be backported, this is 1.6-only.
2014-12-24 13:47:55 +01:00
Willy Tarreau
57767b8032 BUG/MEDIUM: memory: fix freeing logic in pool_gc2()
There's a long-standing bug in pool_gc2(). It tries to protect the pool
against releasing of too many entries but the formula is wrong as it
compares allocated to minavail instead of (allocated-used) to minavail.
Under memory contention, it ends up releasing more than what is granted
by minavail and causes trouble to the dynamic buffer allocator.

This bug is in fact major by itself, but since minavail has never been
used till now, there is no impact at least in mainline. A backport to
1.5 is desired anyway in case any future backport or out-of-tree patch
relies on this.
2014-12-23 11:22:57 +01:00
Willy Tarreau
a69fc9f803 BUG/MAJOR: stream-int: properly check the memory allocation return
In stream_int_register_handler(), we call si_alloc_appctx(si) but as a
mistake, instead of checking the return value for a NULL, we test <si>.
This bug was discovered under extreme memory contention (memory for only
two buffers with 500 connections waiting) and after 3 million failed
connections. While it was very hard to produce it, the fix is tagged
major because in theory it could happen when haproxy runs with a very
low "-m" setting preventing from allocating just the few bytes needed
for an appctx. But most users will never be able to trigger it. The
fix was confirmed to address the bug.

This fix must be backported to 1.5.
2014-12-23 11:22:39 +01:00
Thierry FOURNIER
fe1ebcd2cf BUG/MAJOR: ns: HAProxy segfault if the cli_conn is not from a network connection
The path "MAJOR: namespace: add Linux network namespace support" doesn't
permit to use internal data producer like a "peers synchronisation"
system. The result is a segfault when the internal application starts.

This patch fix the commit b3e54fe387
It is introduced in 1.6dev version, it doesn't need to be backported.
2014-12-19 23:39:29 +01:00
Thierry FOURNIER
07e78c50b5 MINOR: map/acl/dumpstats: remove the "Done." message
By convention, the HAProxy CLI doesn't return message if the opration
is sucessfully done. The MAP and ACL returns the "Done." message, an
its noise the output during big MAP or ACL injection.
2014-12-18 23:29:46 +01:00
Willy Tarreau
f6b7001338 BUG/MEDIUM: config: do not propagate processes between stopped processes
Immo Goltz reported a case of segfault while parsing the config where
we try to propagate processes across stopped frontends (those with a
"disabled" statement). The fix is trivial. The workaround consists in
commenting out these frontends, although not always easy.

This fix must be backported to 1.5.
2014-12-18 14:03:31 +01:00
Willy Tarreau
8a95d8cd61 BUG/MINOR: config: fix typo in condition when propagating process binding
propagate_processes() has a typo in a condition :

	if (!from->cap & PR_CAP_FE)
		return;

The return is never taken because each proxy has at least one capability
so !from->cap always evaluates to zero. Most of the time the caller already
checks that <from> is a frontend. In the cases where it's not tested
(use_backend, reqsetbe), the rules have been checked for the context to
be a frontend as well, so in the end it had no nasty side effect.

This should be backported to 1.5.
2014-12-18 14:03:31 +01:00
Godbach
d972203fbc BUG/MINOR: parse: refer curproxy instead of proxy
Since during parsing stage, curproxy always represents a proxy to be operated,
it should be a mistake by referring proxy.

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2014-12-18 11:01:51 +01:00
Godbach
1f1fae6202 BUG/MINOR: http: fix typo: "401 Unauthorized" => "407 Unauthorized"
401 Unauthorized => 407 Unauthorized

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2014-12-17 17:05:49 +01:00
Godbach
d39ae7ddc9 CLEANUP: epoll: epoll_events should be allocated according to global.tune.maxpollevents
Willy: commit f2e8ee2b introduced an optimization in the old speculative epoll
code, which implemented its own event cache. It was needed to store that many
events (it was bound to maxsock/4 btw). Now the event cache lives on its own
and we don't need this anymore. And since events are allocated on the kernel
side, we only need to allocate the events we want to return.

As a result, absmaxevents will be not used anymore. Just remove the definition
and the comment of it, replace it with global.tune.maxpollevents. It is also an
optimization of memory usage for large amounts of sockets.

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2014-12-17 17:04:53 +01:00
PiBa-NL
bd556bf35c DOC: httplog does not support 'no' 2014-12-12 11:21:34 +01:00
Vincent Bernat
1228dc0e7a BUG/MEDIUM: sample: fix random number upper-bound
random() will generate a number between 0 and RAND_MAX. POSIX mandates
RAND_MAX to be at least 32767. GNU libc uses (1<<31 - 1) as
RAND_MAX.

In smp_fetch_rand(), a reduction is done with a multiply and shift to
avoid skewing the results. However, the shift was always 32 and hence
the numbers were not distributed uniformly in the specified range. We
fix that by dividing by RAND_MAX+1. gcc is smart enough to turn that
into a shift:

    0x000000000046ecc8 <+40>:    shr    $0x1f,%rax
2014-12-10 22:45:34 +01:00
Godbach
f2dd68d0e0 DOC: fix a few typos
include/types/proto_http.h: hwen -> when
include/types/server.h: SRV_ST_DOWN -> SRV_ST_STOPPED
src/backend.c: prefer-current-server -> prefer-last-server

Signed-off-by: Godbach <nylzhaowei@gmail.com>
2014-12-10 05:34:55 +01:00
Lukas Tribus
e4e30f7d52 BUILD: ssl: use OPENSSL_NO_OCSP to detect OCSP support
Since commit 656c5fa7e8 ("BUILD: ssl: disable OCSP when using
boringssl) the OCSP code is bypassed when OPENSSL_IS_BORINGSSL
is defined. The correct thing to do here is to use OPENSSL_NO_OCSP
instead, which is defined for this exact purpose in
openssl/opensslfeatures.h.

This makes haproxy forward compatible if boringssl ever introduces
full OCSP support with the additional benefit that it links fine
against a OCSP-disabled openssl.

Signed-off-by: Lukas Tribus <luky-37@hotmail.com>
2014-12-09 20:49:22 +01:00
Willy Tarreau
f3d3482c98 BUG/MEDIUM: tcp-checks: disable quick-ack unless next rule is an expect
Using "option tcp-checks" without any rule is different from not using
it at all in that checks are sent with the TCP quick ack mode enabled,
causing servers to log incoming port probes.

This commit fixes this behaviour by disabling quick-ack on tcp-checks
unless the next rule exists and is an expect. All combinations were
tested and now the behaviour is as expected : basic port probes are
now doing a SYN-SYN/ACK-RST sequence.

This fix must be backported to 1.5.
2014-12-08 12:11:28 +01:00
Willy Tarreau
d2a49592fa BUG/MEDIUM: tcp-check: don't rely on random memory contents
If "option tcp-check" is used and no "tcp-check" rule is specified, we
only look at rule->action which dereferences the proxy's memory and which
can randomly match TCPCHK_ACT_CONNECT or whatever else, causing a check
to fail. This bug is the result of an incorrect fix attempted in commit
f621bea ("BUG/MINOR: tcpcheck connect wrong behavior").

This fix must be backported into 1.5.
2014-12-08 11:52:28 +01:00
Willy Tarreau
e7b9ed33ee BUG/MINOR: tcp-check: don't condition data polling on check type
tcp_check_main() would condition the polling for writes on check->type,
but this is absurd given that check->type == PR_O2_TCPCHK_CHK since this
is the only way we can get there! This patch removes this confusing test.
2014-12-08 11:28:18 +01:00
Cyril Bonté
9ede66b06d MEDIUM: checks: provide environment variables to the external checks
The external command accepted 4 arguments, some with the value "NOT_USED" when
not applicable. In order to make the exernal command more generic, this patch
also provides the values in environment variables. This allows to provide more
information.

Currently, the supported environment variables are :

PATH, as previously provided.

HAPROXY_PROXY_NAME, the backend name
HAPROXY_PROXY_ID, the backend id
HAPROXY_PROXY_ADDR, the first bind address if available (or empty)
HAPROXY_PROXY_PORT, the first bind port if available (or empty)

HAPROXY_SERVER_NAME, the server name
HAPROXY_SERVER_ID, the server id
HAPROXY_SERVER_ADDR, the server address
HAPROXY_SERVER_PORT, the server port if available (or empty)
HAPROXY_SERVER_MAXCONN, the server max connections
HAPROXY_SERVER_CURCONN, the current number of connections on the server
2014-12-02 21:44:33 +01:00
Cyril Bonté
777be861c5 MINOR: checks: allow external checks in backend sections
Previously, external checks required to find at least one listener in order to
pass the <proxy_address> and <proxy_port> arguments to the external script.
It prevented from declaring external checks in backend sections and haproxy
rejected the configuration.

The listener is now optional and values "NOT_USED" are passed if no listener is
found. For instance, this is the case with a backend section.

This is specific to the 1.6 branch.
2014-12-02 21:44:33 +01:00
Willy Tarreau
83f2592bcd BUG/MEDIUM: payload: ensure that a request channel is available
Denys Fedoryshchenko reported a segfault when using certain
sample fetch functions in the "tcp-request connection" rulesets
despite the warnings. This is because some tests for the existence
of the channel were missing.

The fetches which were fixed are :
  - req.ssl_hello_type
  - rep.ssl_hello_type
  - req.ssl_sni

This fix must be backported to 1.5.
2014-11-26 13:32:22 +01:00
Willy Tarreau
4deaf39243 BUG/MEDIUM: patterns: previous fix was incomplete
Dmitry Sivachenko <trtrmitya@gmail.com> reported that commit 315ec42
("BUG/MEDIUM: pattern: don't load more than once a pattern list.")
relies on an uninitialised variable in the stack. While it used to
work fine during the tests, if the uninitialized variable is non-null,
some patterns may be aggregated if loaded multiple times, resulting in
slower processing, which was the original issue it tried to address.

The fix needs to be backported to 1.5.
2014-11-26 13:17:03 +01:00
Willy Tarreau
3b24641745 BUG/MAJOR: sessions: unlink session from list on out of memory
Since embryonic sessions were introduced in 1.5-dev12 with commit
2542b53 ("MAJOR: session: introduce embryonic sessions"), a major
bug remained present. If haproxy cannot allocate memory during
session_complete() (for example, no more buffers), it will not
unlink the new session from the sessions list. This will cause
memory corruptions if the memory area from the session is reused
for anything else, and may also cause bogus output on "show sess"
on the CLI.

This fix must be backported to 1.5.
2014-11-25 22:09:05 +01:00
Emeric Brun
c9a0f6d023 MINOR: samples: add the word converter.
word(<index>,<delimiters>)
  Extracts the nth word considering given delimiters from an input string.
  Indexes start at 1 and delimiters are a string formatted list of chars.
2014-11-25 14:48:39 +01:00
Willy Tarreau
23a5c396ec DEBUG: pools: apply poisonning on every allocated pool
Till now, when memory poisonning was enabled, it used to be done only
after a calloc(). But sometimes it's not enough to detect unexpected
sharing, so let's ensure that we now poison every allocation once it's
in place. Note that enabling poisonning significantly hurts performance
(it can typically half the overall performance).
2014-11-25 13:48:43 +01:00
Emeric Brun
f399b0debf MINOR: samples: adds the field converter.
field(<index>,<delimiters>)
  Extracts the substring at the given index considering given delimiters from
  an input string. Indexes start at 1 and delimiters are a string formatted
  list of chars.
2014-11-24 17:44:02 +01:00
Emeric Brun
54c4ac8417 MINOR: samples: adds the bytes converter.
bytes(<offset>[,<length>])
  Extracts a some bytes from an input binary sample. The result is a
  binary sample starting at an offset (in bytes) of the original sample
  and optionnaly truncated at the given length.
2014-11-24 17:44:02 +01:00
Willy Tarreau
0f30d26dbf MINOR: sample: add a few basic internal fetches (nbproc, proc, stopping)
Sometimes, either for debugging or for logging we'd like to have a bit
of information about the running process. Here are 3 new fetches for this :

nbproc : integer
  Returns an integer value corresponding to the number of processes that were
  started (it equals the global "nbproc" setting). This is useful for logging
  and debugging purposes.

proc : integer
  Returns an integer value corresponding to the position of the process calling
  the function, between 1 and global.nbproc. This is useful for logging and
  debugging purposes.

stopping : boolean
  Returns TRUE if the process calling the function is currently stopping. This
  can be useful for logging, or for relaxing certain checks or helping close
  certain connections upon graceful shutdown.
2014-11-24 17:44:02 +01:00
Emeric Brun
4b9e80268e BUG/MINOR: samples: fix unnecessary memcopy converting binary to string. 2014-11-24 17:44:02 +01:00
Willy Tarreau
42fb809cf4 BUG/MINOR: peers: the buffer size is global.tune.bufsize, not trash.size
Currently this is harmless since trash.size is copied from
global.tune.bufsize, but this may soon change when buffers become
more dynamic.

At least for consistency it should be backported to 1.5.
2014-11-24 15:40:57 +01:00
Thierry FOURNIER
315ec4217f BUG/MEDIUM: pattern: don't load more than once a pattern list.
A memory optimization can use the same pattern expression for many
equal pattern list (same parse method, index method and index_smp
method).

The pattern expression is returned by "pattern_new_expr", but this
function dont indicate if the returned pattern is already in use.

So, the caller function reload the list of patterns in addition with
the existing patterns. This behavior is not a problem with tree indexed
pattern, but it grows the lists indexed patterns.

This fix add a "reuse" flag in return of the function "pattern_new_expr".
If the flag is set, I suppose that the patterns are already loaded.

This fix must be backported into 1.5.
2014-11-24 15:40:16 +01:00
Willy Tarreau
cbf8cf68f5 DOC: fix typo in the body parser documentation for msg.sov
"positive" ... "null or positive". The second one is "null or negative".
2014-11-21 21:11:30 +01:00