34 Commits

Author SHA1 Message Date
Willy Tarreau
1c4b814087 MEDIUM: listener: support rebinding during resume()
When a listener resumes operations, supporting a full rebind makes it
possible to perform a full stop as a pause(). This will be used for
pausing abstract namespace unix sockets.
2014-07-08 01:13:35 +02:00
Willy Tarreau
092d865c53 MEDIUM: listener: implement a per-protocol pause() function
In order to fix the abstact socket pause mechanism during soft restarts,
we'll need to proceed differently depending on the socket protocol. The
pause_listener() function already supports some protocol-specific handling
for the TCP case.

This commit makes this cleaner by adding a new ->pause() function to the
protocol struct, which, if defined, may be used to pause a listener of a
given protocol.

For now, only TCP has been adapted, with the specific code moved from
pause_listener() to tcp_pause_listener().
2014-07-08 01:13:34 +02:00
Willy Tarreau
39447b6a57 BUG/MINOR: listener: set the listener's fd to -1 after deletion
This is currently harmless, but when stopping a listener, its fd is
closed but not set to -1, so it is not possible to re-open it again.
Currently this has no impact but can have after the abstract sockets
are modified to perform a complete close on soft-reload.

The fix can be backported to 1.5 and may even apply to 1.4 (protocols.c).
2014-07-08 01:13:34 +02:00
Willy Tarreau
ae30253c27 MAJOR: listener: only start listeners bound to the same processes
Now that we know what processes a "bind" statement is attached to, we
have the ability to avoid starting some of them when they're not on the
proper process. This feature is disabled when running in foreground
however, so that debug mode continues to work with everything bound to
the first and only process.

The main purpose of this change is to finally allow the global stats
sockets to be each bound to a different process.

It can also be used to force haproxy to use different sockets in different
processes for the same IP:port. The purpose is that under Linux 3.9 and
above (and possibly other OSes), when multiple processes are bound to the
same IP:port via different sockets, the system is capable of performing
a perfect round-robin between the socket queues instead of letting any
process pick all the connections from a queue. This results in a smoother
load balancing and may achieve a higher performance with a large enough
maxaccept setting.
2014-05-09 19:16:26 +02:00
Willy Tarreau
6ae1ba6f29 MEDIUM: listener: parse the new "process" bind keyword
This sets the bind_proc entry in the bind_conf config block. For now it's
still unused, but the doc was updated.
2014-05-09 19:16:26 +02:00
Willy Tarreau
bb66030a30 MEDIUM: listener: make the accept function more robust against pauses
During some tests in multi-process mode under Linux, it appeared that
issuing "disable frontend foo" on the CLI to pause a listener would
make the shutdown(read) of certain processes disturb another process
listening on the same socket, resulting in a 100% CPU loop. What
happens is that accept() returns EAGAIN without accepting anything.
Fortunately, we see that epoll_wait() reports EPOLLIN+EPOLLRDHUP
(likely because the FD points to the same file in the kernel), so we
can use that to stop the other process from trying to accept connections
for a short time and try again later, hoping for the situation to change.
We must not disable the FD otherwise there's no way to re-enable it.

Additionally, during these tests, a loop was encountered on EINVAL which
was not caught. Now if we catch an EINVAL, we proceed the same way, in
case the socket is re-enabled later.
2014-05-07 23:13:08 +02:00
Willy Tarreau
95ccdde1f2 BUILD: listener: add fcntl.h and unistd.h
Otherwise it fails to build on some platforms.
2014-02-01 09:29:03 +01:00
Willy Tarreau
818dca5098 BUG/MEDIUM: listener: improve detection of non-working accept4()
On ARM, glibc does not implement accept4() and simply returns ENOSYS
which was not caught as a reason to fall back to accept(), resulting
in a spinning process since poll() would call again.

Let's change the error detection mechanism to save the broken status
of the syscall into a local variable that is used to fall back to the
legacy accept().

In addition to this, since the code was becoming a bit messy, the
accept4() was removed, so now the fallback code and the legacy code
are the same. This will also increase bug report accuracy if needed.

This is 1.5-specific, no backport is needed.
2014-01-31 19:40:19 +01:00
Willy Tarreau
e43d5323c6 MEDIUM: listener: apply a limit on the session rate submitted to SSL
Just like the previous commit, we sometimes want to limit the rate of
incoming SSL connections. While it can be done for a frontend, it was
not possible for a whole process, which makes sense when multiple
processes are running on a system to server multiple customers.

The new global "maxsslrate" setting is usable to fix a limit on the
session rate going to the SSL frontends. The limits applies before
the SSL handshake and not after, so that it saves the SSL stack from
expensive key computations that would finally be aborted before being
accounted for.

The same setting may be changed at run time on the CLI using
"set rate-limit ssl-session global".
2014-01-28 15:50:10 +01:00
Willy Tarreau
93e7c006c1 MEDIUM: listener: add support for limiting the session rate in addition to the connection rate
It's sometimes useful to be able to limit the connection rate on a machine
running many haproxy instances (eg: per customer) but it removes the ability
for that machine to defend itself against a DoS. Thus, better also provide a
limit on the session rate, which does not include the connections rejected by
"tcp-request connection" rules. This permits to have much higher limits on
the connection rate without having to raise the session rate limit to insane
values.

The limit can be changed on the CLI using "set rate-limit sessions global",
or in the global section using "maxsessrate".
2014-01-28 15:49:27 +01:00
Willy Tarreau
6c11bd2f89 OPTIM: raw-sock: don't speculate after a short read if polling is enabled
This is the reimplementation of the "done" action : when we experience
a short read, we're almost certain that we've exhausted the system's
buffers and that we'll meet an EAGAIN if we attempt to read again. If
the FD is not yet polled, the stream interface already takes care of
stopping the speculative read. When the FD is already being polled, we
have two options :
  - either we're running from a level-triggered poller, in which case
    we'd rather report that we've reached the end so that we don't
    speculate over the poller and let it report next time data are
    available ;

  - or we're running from an edge-triggered poller in which case we
    have no choice and have to see the EAGAIN to re-enable events.

At the moment we don't have any edge-triggered poller, so it's desirable
to avoid speculative I/O that we know will fail.

Note that this must not be ported to SSL since SSL hides the real
readiness of the file descriptor.

Thanks to this change, we observe no EAGAIN anymore during keep-alive
transfers, and failed recvfrom() are reduced by half in http-server-close
mode (the client-facing side is always being polled and the second recv
can be avoided). Doing so results in about 5% performance increase in
keep-alive mode. Similarly, we used to have up to about 1.6% of EAGAIN
on accept() (1/maxaccept), and these have completely disappeared under
high loads.
2014-01-26 00:42:32 +01:00
Willy Tarreau
f817e9f473 MAJOR: polling: rework the whole polling system
This commit heavily changes the polling system in order to definitely
fix the frequent breakage of SSL which needs to remember the last
EAGAIN before deciding whether to poll or not. Now we have a state per
direction for each FD, as opposed to a previous and current state
previously. An FD can have up to 8 different states for each direction,
each of which being the result of a 3-bit combination. These 3 bits
indicate a wish to access the FD, the readiness of the FD and the
subscription of the FD to the polling system.

This means that it will now be possible to remember the state of a
file descriptor across disable/enable sequences that generally happen
during forwarding, where enabling reading on a previously disabled FD
would result in forgetting the EAGAIN flag it met last time.

Several new state manipulation functions have been introduced or
adapted :
  - fd_want_{recv,send} : enable receiving/sending on the FD regardless
    of its state (sets the ACTIVE flag) ;

  - fd_stop_{recv,send} : stop receiving/sending on the FD regardless
    of its state (clears the ACTIVE flag) ;

  - fd_cant_{recv,send} : report a failure to receive/send on the FD
    corresponding to EAGAIN (clears the READY flag) ;

  - fd_may_{recv,send}  : report the ability to receive/send on the FD
    as reported by poll() (sets the READY flag) ;

Some functions are used to report the current FD status :

  - fd_{recv,send}_active
  - fd_{recv,send}_ready
  - fd_{recv,send}_polled

Some functions were removed :
  - fd_ev_clr(), fd_ev_set(), fd_ev_rem(), fd_ev_wai()

The POLLHUP/POLLERR flags are now reported as ready so that the I/O layers
knows it can try to access the file descriptor to get this information.

In order to simplify the conditions to add/remove cache entries, a new
function fd_alloc_or_release_cache_entry() was created to be used from
pollers while scanning for updates.

The following pollers have been updated :

   ev_select() : done, built, tested on Linux 3.10
   ev_poll()   : done, built, tested on Linux 3.10
   ev_epoll()  : done, built, tested on Linux 3.10 & 3.13
   ev_kqueue() : done, built, tested on OpenBSD 5.2
2014-01-26 00:42:30 +01:00
Willy Tarreau
a593ec5bf4 MEDIUM: listener: fix polling management in the accept loop
The accept loop used to force fd_poll_recv() even in places where it
was not completely appropriate (eg: unexpected errors). It does not
yet cause trouble but will do with the upcoming polling changes. Let's
use it only where relevant now. EINTR/ECONNABORTED do not result in
poll() anymore but the failed connection is simply skipped (this code
dates from 1.1.32 when error codes were first considered).
2014-01-20 22:27:16 +01:00
Willy Tarreau
4448925930 BUILD/MINOR: listener: remove a glibc warning on accept4()
The accept4() Linux syscall requires _GNU_SOURCE on ix86, otherwise
it emits a warning. On other archs including x86_64, this problem
doesn't happen. Thanks to Charles Carter from Sigma Software for
reporting this.
2014-01-14 17:54:12 +01:00
Willy Tarreau
ef38c39287 MEDIUM: sample: systematically pass the keyword pointer to the keyword
We're having a lot of duplicate code just because of minor variants between
fetch functions that could be dealt with if the functions had the pointer to
the original keyword, so let's pass it as the last argument. An earlier
version used to pass a pointer to the sample_fetch element, but this is not
the best solution for two reasons :
  - fetch functions will solely rely on the keyword string
  - some other smp_fetch_* users do not have the pointer to the original
    keyword and were forced to pass NULL.

So finally we're passing a pointer to the keyword as a const char *, which
perfectly fits the original purpose.
2013-08-01 21:17:13 +02:00
Willy Tarreau
dc13c11c1e BUG/MEDIUM: prevent gcc from moving empty keywords lists into BSS
Benoit Dolez reported a failure to start haproxy 1.5-dev19. The
process would immediately report an internal error with missing
fetches from some crap instead of ACL names.

The cause is that some versions of gcc seem to trim static structs
containing a variable array when moving them to BSS, and only keep
the fixed size, which is just a list head for all ACL and sample
fetch keywords. This was confirmed at least with gcc 3.4.6. And we
can't move these structs to const because they contain a list element
which is needed to link all of them together during the parsing.

The bug indeed appeared with 1.5-dev19 because it's the first one
to have some empty ACL keyword lists.

One solution is to impose -fno-zero-initialized-in-bss to everyone
but this is not really nice. Another solution consists in ensuring
the struct is never empty so that it does not move there. The easy
solution consists in having a non-null list head since it's not yet
initialized.

A new "ILH" list head type was thus created for this purpose : create
an Initialized List Head so that gcc cannot move the struct to BSS.
This fixes the issue for this version of gcc and does not create any
burden for the declarations.
2013-06-21 23:29:02 +02:00
Willy Tarreau
6d4e4e8dd2 MEDIUM: acl: remove a lot of useless ACLs that are equivalent to their fetches
The following 116 ACLs were removed because they're redundant with their
fetch function since last commit which allows the fetch function to be
used instead for types BOOL, INT and IP. Most places are now left with
an empty ACL keyword list that was not removed so that it's easier to
add other ACLs later.

always_false, always_true, avg_queue, be_conn, be_id, be_sess_rate, connslots,
nbsrv, queue, srv_conn, srv_id, srv_is_up, srv_sess_rate, res.comp, fe_conn,
fe_id, fe_sess_rate, dst_conn, so_id, wait_end, http_auth, http_first_req,
status, dst, dst_port, src, src_port, sc1_bytes_in_rate, sc1_bytes_out_rate,
sc1_clr_gpc0, sc1_conn_cnt, sc1_conn_cur, sc1_conn_rate, sc1_get_gpc0,
sc1_gpc0_rate, sc1_http_err_cnt, sc1_http_err_rate, sc1_http_req_cnt,
sc1_http_req_rate, sc1_inc_gpc0, sc1_kbytes_in, sc1_kbytes_out, sc1_sess_cnt,
sc1_sess_rate, sc1_tracked, sc1_trackers, sc2_bytes_in_rate,
sc2_bytes_out_rate, sc2_clr_gpc0, sc2_conn_cnt, sc2_conn_cur, sc2_conn_rate,
sc2_get_gpc0, sc2_gpc0_rate, sc2_http_err_cnt, sc2_http_err_rate,
sc2_http_req_cnt, sc2_http_req_rate, sc2_inc_gpc0, sc2_kbytes_in,
sc2_kbytes_out, sc2_sess_cnt, sc2_sess_rate, sc2_tracked, sc2_trackers,
sc3_bytes_in_rate, sc3_bytes_out_rate, sc3_clr_gpc0, sc3_conn_cnt,
sc3_conn_cur, sc3_conn_rate, sc3_get_gpc0, sc3_gpc0_rate, sc3_http_err_cnt,
sc3_http_err_rate, sc3_http_req_cnt, sc3_http_req_rate, sc3_inc_gpc0,
sc3_kbytes_in, sc3_kbytes_out, sc3_sess_cnt, sc3_sess_rate, sc3_tracked,
sc3_trackers, src_bytes_in_rate, src_bytes_out_rate, src_clr_gpc0,
src_conn_cnt, src_conn_cur, src_conn_rate, src_get_gpc0, src_gpc0_rate,
src_http_err_cnt, src_http_err_rate, src_http_req_cnt, src_http_req_rate,
src_inc_gpc0, src_kbytes_in, src_kbytes_out, src_sess_cnt, src_sess_rate,
src_updt_conn_cnt, table_avl, table_cnt, ssl_c_ca_err, ssl_c_ca_err_depth,
ssl_c_err, ssl_c_used, ssl_c_verify, ssl_c_version, ssl_f_version, ssl_fc,
ssl_fc_alg_keysize, ssl_fc_has_crt, ssl_fc_has_sni, ssl_fc_use_keysize,
2013-06-11 21:22:58 +02:00
Willy Tarreau
d86e29d2a1 CLEANUP: acl: remove unused references to ACL_USE_*
Now that acl->requires is not used anymore, we can remove all references
to it as well as all ACL_USE_* flags.
2013-04-03 02:13:00 +02:00
Willy Tarreau
c48c90dfa5 MAJOR: acl: remove the arg_mask from the ACL definition and use the sample fetch's
Now that ACLs solely rely on sample fetch functions, make them use the
same arg mask. All inconsistencies have been fixed separately prior to
this patch, so this patch almost only adds a new pointer indirection
and removes all references to ARG*() in the definitions.

The parsing is still performed by the ACL code though.
2013-04-03 02:12:58 +02:00
Willy Tarreau
8ed669b12a MAJOR: acl: make all ACLs reference the fetch function via a sample.
ACL fetch functions used to directly reference a fetch function. Now
that all ACL fetches have their sample fetches equivalent, we can make
ACLs reference a sample fetch keyword instead.

In order to simplify the code, a sample keyword name may be NULL if it
is the same as the ACL's, which is the most common case.

A minor change appeared, http_auth always expects one argument though
the ACL allowed it to be missing and reported as such afterwards, so
fix the ACL to match this. This is not really a bug.
2013-04-03 02:12:58 +02:00
Willy Tarreau
0ccb744ffb MINOR: listener: rename sample fetch functions and declare the sample keywords
The following sample fetch functions were only usable by ACLs but are now
usable by sample fetches too :

          dst_conn, so_id,

The fetch functions have been renamed "smp_fetch_*".
2013-04-03 02:12:57 +02:00
Willy Tarreau
50de90a228 MINOR: listeners: make the accept loop more robust when maxaccept==0
If some listeners are mistakenly configured with 0 as the maxaccept value,
then we now consider them as limited to one accept() at a time. This will
avoid some issues as fixed by the past commit.
2012-11-23 20:22:10 +01:00
Willy Tarreau
16a2147dfe MEDIUM: adjust the maxaccept per listener depending on the number of processes
global.tune.maxaccept was used for all listeners. This becomes really not
convenient when some listeners are bound to a single process and other ones
are bound to many processes.

Now we change the principle : we count the number of processes a listener
is bound to, and apply the maxaccept either entirely if there is a single
process, or divided by twice the number of processes in order to maintain
fairness.

The default limit has also been increased from 32 to 64 as it appeared that
on small machines, 32 was too low to achieve high connection rates.
2012-11-19 12:39:59 +01:00
Willy Tarreau
6b3b0d4736 MEDIUM: listener: provide a fallback for accept4() when not supported
It happens that on some systems, the libc is recent enough to permit
building with accept4() but the kernel does not support it. The result
is then a disaster since no connection is accepted. We now detect this
and automatically fall back to accept() and fcntl() when this happens.
2012-10-22 19:32:55 +02:00
Willy Tarreau
1bc4aab290 MEDIUM: listener: add support for linux's accept4() syscall
On Linux, accept4() does the same as accept() except that it allows
the caller to specify some flags to set on the resulting socket. We
use this to set the O_NONBLOCK flag and thus to save one fcntl()
call in each connection. The effect is a small performance gain of
around 1%.

The option is automatically enabled when target linux2628 is set, or
when the USE_ACCEPT4 Makefile variable is set. If the libc is too old
to provide the equivalent function, this is automatically detected and
our own function is used instead. In any case it is possible to force
the use of our implementation with USE_MY_ACCEPT4.
2012-10-08 20:11:03 +02:00
Willy Tarreau
b3fb60bdcd BUG/MEDIUM: listener: don't pause protocols that do not support it
Pausing a UNIX_STREAM socket results in a major pain because the socket
does not correctly resume, it wakes poll() but return EAGAIN on accept(),
resulting in a busy loop. So let's only pause protocols that support it.

This issues has existed since UNIX sockets were introduced on bind lines.
2012-10-04 08:58:21 +02:00
Willy Tarreau
82569f9158 MEDIUM: monitor: simplify handling of monitor-net and mode health
We were having several different behaviours with monitor-net and
"mode health" :
  - monitor-net on TCP connections was evaluated just after accept(),
    did not count a connection on the frontend and were not subject
    to tcp-request connection rules, and caused an immediate close().

  - monitor-net in HTTP mode was evaluated once the session was
    accepted (eg: on top of SSL), returned "HTTP/1.0 200 OK\r\n\r\n"
    over the connection's data layer and instanciated a session which
    was responsible for closing this connection. A connection AND a
    session were counted for the frontend ;

  - "mode health" with "option httpchk" would do exactly the same as
    monitor-net in HTTP mode ;

  - "mode health" without "option httpchk" would do the same as above
    except that "OK" was returned instead of "HTTP/1.0 200 OK\r\n\r\n".

None of them took care of cleaning the input buffer, sometimes resulting
in a TCP reset to be emitted after the last packet if a request was received
over the connection.

Given the inconsistencies and the complexity in keeping all these features
handled at the right position, we now slightly changed the way they are
handled :

  - all of them are handled just after the "tcp-request connection" rules,
    so that all of them may be blocked using such rules, offering more
    flexibility and consistency ;

  - no connection handshake is performed anymore for non-TCP modes

  - all of them send the response as raw data over the socket, there is no
    more difference between TCP and HTTP mode for example (these rules were
    never meant to be served over SSL connections and were never documented
    as able to do that).

  - any possible pending data on the incoming socket is drained before the
    response is sent, in order to avoid the risk of a reset.

  - none of them exactly did what was documented !

This results in more consistent, more flexible and more accurate handling of
monitor rules, with smaller and more robust code.
2012-09-28 00:01:22 +02:00
Willy Tarreau
eb6cead1de MINOR: standard: make memprintf() support a NULL destination
Doing so removes many checks that were systematically made because
the callees don't know if the caller passed a valid pointer.
2012-09-24 10:53:16 +02:00
Willy Tarreau
4348fad1c1 MAJOR: listeners: use dual-linked lists to chain listeners with frontends
Navigating through listeners was very inconvenient and error-prone. Not to
mention that listeners were linked in reverse order and reverted afterwards.
In order to definitely get rid of these issues, we now do the following :
  - frontends have a dual-linked list of bind_conf
  - frontends have a dual-linked list of listeners
  - bind_conf have a dual-linked list of listeners
  - listeners have a pointer to their bind_conf

This way we can now navigate from anywhere to anywhere and always find the
proper bind_conf for a given listener, as well as find the list of listeners
for a current bind_conf.
2012-09-20 16:48:07 +02:00
Willy Tarreau
51fb7651c4 MINOR: listener: add a scope field in the bind keyword lists
This scope is used to report what the keywords are used for (eg: TCP,
UNIX, ...). It is now reported by bind_dump_kws().
2012-09-18 18:27:14 +02:00
Willy Tarreau
8638f4850f MEDIUM: config: enumerate full list of registered "bind" keywords upon error
When an unknown "bind" keyword is detected, dump the list of all
registered keywords. Unsupported default alternatives are also reported
as "not supported".
2012-09-18 18:27:14 +02:00
Willy Tarreau
3dcc341720 MEDIUM: config: move the common "bind" settings to listener.c
These ones are better placed in listener.c than in cfgparse.c, by relying
on the bind keyword registration subsystem.
2012-09-18 17:17:28 +02:00
Willy Tarreau
269826659d MEDIUM: listener: add a minimal framework to register "bind" keyword options
With the arrival of SSL, the "bind" keyword has received even more options,
all of which are processed in cfgparse in a cumbersome way. So it's time to
let modules register their own bind options. This is done very similarly to
the ACLs with a small difference in that we make the difference between an
unknown option and a known, unimplemented option.
2012-09-15 22:33:08 +02:00
Willy Tarreau
d1d5454180 REORG: split "protocols" files into protocol and listener
It was becoming confusing to have protocols and listeners in the same
files, split them.
2012-09-15 22:29:32 +02:00