Commit Graph

110 Commits

Author SHA1 Message Date
Willy Tarreau
61cfdf4fd8 CLEANUP: tree-wide: replace free(x);x=NULL with ha_free(&x)
This makes the code more readable and less prone to copy-paste errors.
In addition, it allows to place some __builtin_constant_p() predicates
to trigger a link-time error in case the compiler knows that the freed
area is constant. It will also produce compile-time error if trying to
free something that is not a regular pointer (e.g. a function).

The DEBUG_MEM_STATS macro now also defines an instance for ha_free()
so that all these calls can be checked.

178 occurrences were converted. The vast majority of them were handled
by the following Coccinelle script, some slightly refined to better deal
with "&*x" or with long lines:

  @ rule @
  expression E;
  @@
  - free(E);
  - E = NULL;
  + ha_free(&E);

It was verified that the resulting code is the same, more or less a
handful of cases where the compiler optimized slightly differently
the temporary variable that holds the copy of the pointer.

A non-negligible amount of {free(str);str=NULL;str_len=0;} are still
present in the config part (mostly header names in proxies). These
ones should also be cleaned for the same reasons, and probably be
turned into ist strings.
2021-02-26 21:21:09 +01:00
Willy Tarreau
38e8a1c7b8 MINOR: debug: add a new DEBUG_FD build option
When DEBUG_FD is set at build time, we'll keep a counter of per-FD events
in the fdtab. This counter is reported in "show fd" even for closed FDs if
not zero. The purpose is to help spot situations where an apparently closed
FD continues to be reported in loops, or where some events are dismissed.
2020-06-23 10:04:54 +02:00
Willy Tarreau
bc52bec163 MEDIUM: fd: add experimental support for edge-triggered polling
Some of the recent optimizations around the polling to save a few
epoll_ctl() calls have shown that they could also cause some trouble.
However, over time our code base has become totally asynchronous with
I/Os always attempted from the upper layers and only retried at the
bottom, making it look like we're getting closer to EPOLLET support.

There are showstoppers there such as the listeners which cannot support
this. But given that most of the epoll_ctl() dance comes from the
connections, we can try to enable edge-triggered polling on connections.

What this patch does is to add a new global tunable "tune.fd.edge-triggered",
that makes fd_insert() automatically set an et_possible bit on the fd if
the I/O callback is conn_fd_handler. When the epoll code sees an update
for such an FD, it immediately registers it in both directions the first
time and doesn't update it anymore.

On a few tests it proved quite useful with a 14% request rate increase in
a H2->H1 scenario, reducing the epoll_ctl() calls from 2 per request to
2 per connection.

The option is obviously disabled by default as bugs are still expected,
particularly around the subscribe() code where it is possible that some
layers do not always re-attempt reading data after being woken up.
2020-06-19 14:21:46 +02:00
Willy Tarreau
e406386542 MINOR: activity: rename confusing poll_* fields in the output
We have poll_drop, poll_dead and poll_skip which are confusingly named
like their poll_io and poll_exp counterparts except that they are not
per poll() call but per-fd. This patch renames them to poll_drop_fd(),
poll_dead_fd() and poll_skip_fd() for this reason.
2020-06-17 20:35:33 +02:00
Willy Tarreau
e545153c50 MINOR: activity: report the number of times poll() reports I/O
The "show activity" output mentions a number of indicators to explain
wake up reasons but doesn't have the number of times poll() sees some
I/O. And given that multiple events can happen simultaneously, it's
not always possible to deduce this metric by subtracting.

This patch adds a new "poll_io" counter that allows one to see how
often poll() returns with at least one active FD. This should help
detect stuck events and measure various ratios of poll sub-metrics.
2020-06-17 20:25:18 +02:00
Willy Tarreau
b2551057af CLEANUP: include: tree-wide alphabetical sort of include files
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
2020-06-11 10:18:59 +02:00
Willy Tarreau
5b9cde4820 REORG: include: move THREAD_LOCAL and __decl_thread() to compiler.h
Since these are used as type attributes or conditional clauses, they
are used about everywhere and should not require a dependency on
thread.h. Moving them to compiler.h along with other similar statements
like ALIGN() etc looks more logical; this way they become part of the
base API. This allowed to remove thread-t.h from ~12 files, one was
found to only require thread-t and not thread and dict.c was found to
require thread.h.
2020-06-11 10:18:59 +02:00
Willy Tarreau
3727a8a083 REORG: include: move signal.h to haproxy/signal{,-t}.h
No change was necessary. Include from wdt.c was dropped since unneeded.
2020-06-11 10:18:58 +02:00
Willy Tarreau
f268ee8795 REORG: include: split global.h into haproxy/global{,-t}.h
global.h was one of the messiest files, it has accumulated tons of
implicit dependencies and declares many globals that make almost all
other file include it. It managed to silence a dependency loop between
server.h and proxy.h by being well placed to pre-define the required
structs, forcing struct proxy and struct server to be forward-declared
in a significant number of files.

It was split in to, one which is the global struct definition and the
few macros and flags, and the rest containing the functions prototypes.

The UNIX_MAX_PATH definition was moved to compat.h.
2020-06-11 10:18:58 +02:00
Willy Tarreau
0f6ffd652e REORG: include: move fd.h to haproxy/fd{,-t}.h
A few includes were missing in each file. A definition of
struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was
moved to defaults.h

Stdio used to be silently inherited from whatever path but it's needed
for list_pollers() which takes a FILE* and which can thus not be
forward-declared.
2020-06-11 10:18:57 +02:00
Willy Tarreau
48fbcae07c REORG: tools: split common/standard.h into haproxy/tools{,-t}.h
And also rename standard.c to tools.c. The original split between
tools.h and standard.h dates from version 1.3-dev and was mostly an
accident. This patch moves the files back to what they were expected
to be, and takes care of not changing anything else. However this
time tools.h was split between functions and types, because it contains
a small number of commonly used macros and structures (e.g. name_desc)
which in turn cause the massive list of includes of tools.h to conflict
with the callers.

They remain the ugliest files of the whole project and definitely need
to be cleaned and split apart. A few types are defined there only for
functions provided there, and some parts are even OS-specific and should
move somewhere else, such as the symbol resolution code.
2020-06-11 10:18:57 +02:00
Willy Tarreau
c2f7c5895c REORG: include: move common/ticks.h to haproxy/ticks.h
Nothing needed to be changed, there are no exported types.
2020-06-11 10:18:57 +02:00
Willy Tarreau
a04ded58dc REORG: include: move activity to haproxy/
This moves types/activity.h to haproxy/activity-t.h and
proto/activity.h to haproxy/activity.h.

The macros defining the bit field values for the profiling variable
were moved to the type file to be more future-proof.
2020-06-11 10:18:57 +02:00
Willy Tarreau
92b4f1372e REORG: include: move time.h from common/ to haproxy/
This one is included almost everywhere and used to rely on a few other
.h that are not needed (unistd, stdlib, standard.h). It could possibly
make sense to split it into multiple parts to distinguish operations
performed on timers and the internal time accounting, but at this point
it does not appear much important.
2020-06-11 10:18:56 +02:00
Willy Tarreau
3f567e4949 REORG: include: split hathreads into haproxy/thread.h and haproxy/thread-t.h
This splits the hathreads.h file into types+macros and functions. Given
that most users of this file used to include it only to get the definition
of THREAD_LOCAL and MAXTHREADS, the bare minimum was placed into thread-t.h
(i.e. types and macros).

All the thread management was left to haproxy/thread.h. It's worth noting
the drop of the trailing "s" in the name, to remove the permanent confusion
that arises between this one and the system implementation (no "s") and the
makefile's option (no "s").

For consistency, src/hathreads.c was also renamed thread.c.

A number of files were updated to only include thread-t which is the one
they really needed.

Some future improvements are possible like replacing empty inlined
functions with macros for the thread-less case, as building at -O0 disables
inlining and causes these ones to be emitted. But this really is cosmetic.
2020-06-11 10:18:56 +02:00
Willy Tarreau
58017eef3f REORG: include: move the BUG_ON() code to haproxy/bug.h
This one used to be stored into debug.h but the debug tools got larger
and require a lot of other includes, which can't use BUG_ON() anymore
because of this. It does not make sense and instead this macro should
be placed into the lower includes and given its omnipresence, the best
solution is to create a new bug.h with the few surrounding macros needed
to trigger bugs and place assertions anywhere.

Another benefit is that it won't be required to add include <debug.h>
anymore to use BUG_ON, it will automatically be covered by api.h. No
less than 32 occurrences were dropped.

The FSM_PRINTF macro was dropped since not used at all anymore (probably
since 1.6 or so).
2020-06-11 10:18:56 +02:00
Willy Tarreau
4c7e4b7738 REORG: include: update all files to use haproxy/api.h or api-t.h if needed
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:

  - common/config.h
  - common/compat.h
  - common/compiler.h
  - common/defaults.h
  - common/initcall.h
  - common/tools.h

The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.

In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.

No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
2020-06-11 10:18:42 +02:00
Willy Tarreau
3858b122a6 CLEANUP: remove support for USE_MY_EPOLL
This was made to support epoll on patched 2.4 kernels, and on early 2.6
using alternative libcs thanks to the arch-specific syscall definitions.
All the features we support have been around since 2.6.2 and present in
glibc since 2.3.2, neither of which are found in field anymore. Let's
simply drop this and use epoll normally.
2020-03-10 07:08:10 +01:00
Willy Tarreau
55c5399846 MINOR: epoll: always initialize all of epoll_event to please valgrind
valgrind complains that epoll_ctl() uses an epoll_event in which we
have only set the part we use from the data field (i.e. the fd). Tests
show that pre-initializing the struct in the stack doesn't have a
measurable impact so let's do it.
2020-02-26 14:36:27 +01:00
Willy Tarreau
03e7853581 BUILD: remove obsolete support for -mregparm / USE_REGPARM
This used to be a minor optimization on ix86 where registers are scarce
and the calling convention not very efficient, but this platform is not
relevant enough anymore to warrant all this dirt in the code for the sake
of saving 1 or 2% of performance. Modern platforms don't use this at all
since their calling convention already defaults to using several registers
so better get rid of this once for all.
2020-02-25 07:41:47 +01:00
Willy Tarreau
902871dd07 CLEANUP: epoll: place the struct epoll_event in the stack
Historically we used to have a global epoll_event for various
manipulations involving epoll_ctl() and when threads were added,
this was turned to a thread_local, which is needlessly expensive
since it's just a temporary variable. Let's move it to a local
variable wherever it's called instead.
2020-02-21 11:21:12 +01:00
Willy Tarreau
5d7dcc2a8e OPTIM: epoll: always poll for recv if neither active nor ready
The cost of enabling polling in one direction with epoll is very high
because it requires one syscall per FD and per direction change. In
addition we don't know about input readiness until we either try to
receive() or enable polling and watch the result. With HTTP keep-alive,
both are equally expensive as it's very uncommon to see the server
instantly respond (unless it's a second stage of the same process on
localhost, which has become much less common with threads).

But when a connection is established it's also quite usual to have to
poll for sending (except on localhost or UNIX sockets where it almost
always instantly works). So this cost of polling could be factored out
with the second step if both were enabled together.

This is the idea behind this patch. What it does is to always enable
polling for Rx if it's not ready and at least one direction is active.
This means that if it's not explicitly disabled, or if it was but in a
state that causes the loss of the information (rx ready cannot be
guessed), then let's take any opportunity for a polling change to
enable it at the same time, and learn about rx readiness for free.

In addition the FD never gets unregistered for Rx unless it's ready
and was blocked (buffer full). This avoids a lot of the flip-flop
behaviour at beginning and end of requests.

On a test with 10k requests in keep-alive, the difference is quite
noticeable:

Before:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 83.67    0.010847           0     20078           epoll_ctl
 16.33    0.002117           0      2231           epoll_wait
  0.00    0.000000           0        20        20 connect
------ ----------- ----------- --------- --------- ----------------
100.00    0.012964                 22329        20 total

After:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.35    0.003351           1      2644           epoll_wait
  2.36    0.000082           4        20        20 connect
  1.29    0.000045           0        66           epoll_ctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.003478                  2730        20 total

It may also save a recvfrom() after connect() by changing the following
sequence, effectively saving one epoll_ctl() and one recvfrom() :

           before              |            after
  -----------------------------+----------------------------
  - connect()                  |  - connect()
  - epoll_ctl(add,out)         |  - epoll_ctl(add, in|out)
  - sendto()                   |  - epoll_wait() = out
  - epoll_ctl(mod,in|out)      |  - send()
  - epoll_wait() = out         |  - epoll_wait() = in|out
  - recvfrom() = EAGAIN        |  - recvfrom() = OK
  - epoll_ctl(mod,in)          |  - recvfrom() = EAGAIN
  - epoll_wait() = in          |  - epoll_ctl(mod, in)
  - recvfrom() = OK            |  - epoll_wait()
  - recvfrom() = EAGAIN        |
  - epoll_wait()               |
    (...)

Now on a 10M req test on 16 threads with 2k concurrent conns and 415kreq/s,
we see 190k updates total and 14k epoll_ctl() only.
2019-12-27 16:38:47 +01:00
Willy Tarreau
11ef0837af MINOR: pollers: add a new flag to indicate pollers reporting ERR & HUP
In practice it's all pollers except select(). It turns out that we're
keeping some legacy code only for select and enforcing it on all
pollers, let's offer the pollers the ability to declare that they
do not need that.
2019-12-27 14:04:33 +01:00
Willy Tarreau
6b3089856f MEDIUM: fd: do not use the FD_POLL_* flags in the pollers anymore
As mentioned in previous commit, these flags do not map well to
modern poller capabilities. Let's use the FD_EV_*_{R,W} flags instead.
This first patch only performs a 1-to-1 mapping making sure that the
previously reported flags are still reported identically while using
the closest possible semantics in the pollers.

It's worth noting that kqueue will now support improvements such as
returning distinctions between shut and errors on each direction,
though this is not exploited for now.
2019-09-06 19:09:56 +02:00
Willy Tarreau
5bee3e2f47 MEDIUM: fd: remove the FD_EV_POLLED status bit
Since commit 7ac0e35f2 in 1.9-dev1 ("MAJOR: fd: compute the new fd polling
state out of the fd lock") we've started to update the FD POLLED bit a
bit more aggressively. Lately with the removal of the FD cache, this bit
is always equal to the ACTIVE bit. There's no point continuing to watch
it and update it anymore, all it does is create confusion and complicate
the code. One interesting side effect is that it now becomes visible that
all fd_*_{send,recv}() operations systematically call updt_fd_polling(),
except fd_cant_recv()/fd_cant_send() which never saw it change.
2019-09-05 09:31:18 +02:00
Olivier Houchard
53055055c5 MEDIUM: pollers: Remember the state for read and write for each threads.
In the poller code, instead of just remembering if we're currently polling
a fd or not, remember if we're polling it for writing and/or for reading, that
way, we can avoid to modify the polling if it's already polled as needed.
2019-07-31 14:54:41 +02:00
Olivier Houchard
305d5ab469 MAJOR: fd: Get rid of the fd cache.
Now that the architecture was changed so that attempts to receive/send data
always come from the upper layers, instead of them only trying to do so when
the lower layer let them know they could try, we can finally get rid of the
fd cache. We don't really need it anymore, and removing it gives us a small
performance boost.
2019-07-31 14:12:55 +02:00
Willy Tarreau
2ae84e445d MEDIUM: poller: separate the wait time from the wake events
We have been abusing the do_poll()'s timeout for a while, making it zero
whenever there is some known activity. The problem this poses is that it
complicates activity diagnostic by incrementing the poll_exp field for
each known activity. It also requires extra computations that could be
avoided.

This change passes a "wake" argument to say that the poller must not
sleep. This simplifies the operations and allows one to differenciate
expirations from activity.
2019-05-28 17:25:21 +02:00
Olivier Houchard
cb6c9274ae MEDIUM: pollers: Use the new _HA_ATOMIC_* macros.
Use the new _HA_ATOMIC_* macros and add barriers where needed.
2019-03-11 17:02:38 +01:00
Willy Tarreau
beb859abce MINOR: polling: add an option to support busy polling
In some situations, especially when dealing with low latency on processors
supporting a variable frequency or when running inside virtual machines,
each time the process waits for an I/O using the poller, the processor
goes back to sleep or is offered to another VM for a long time, and it
causes excessively high latencies.

A solution to this provided by this patch is to enable busy polling using
a global option. When busy polling is enabled, the pollers never sleep and
loop over themselves waiting for an I/O event to happen or for a timeout
to occur. On multi-processor machines it can significantly overheat the
processor but it usually results in much lower latencies.

A typical test consisting in injecting traffic over a single connection at
a time over the loopback shows a bump from 4640 to 8540 connections per
second on forwarded connections, indicating a latency reduction of 98
microseconds for each connection, and a bump from 12500 to 21250 for
locally terminated connections (redirects), indicating a reduction of
33 microseconds.

It is only usable with epoll and kqueue because select() and poll()'s
API is not convenient for such usages, and the level of performance they
are used in doesn't benefit from this anyway.

The option, which obviously remains disabled by default, can be turned
on using "busy-polling" in the global section, and turned off later
using "no busy-polling". Its status is reported in "show info" to help
troubleshooting suspicious CPU spikes.
2018-11-22 19:47:30 +01:00
Willy Tarreau
48f8bc1368 MINOR: poller: move the call of tv_update_date() back to the pollers
The reason behind this will be to be able to compute a timeout when
busy polling.
2018-11-22 18:57:37 +01:00
Willy Tarreau
609aad9e73 REORG: time/activity: move activity measurements to activity.{c,h}
At the moment the situation with activity measurement is quite tricky
because the struct activity is defined in global.h and declared in
haproxy.c, with operations made in time.h and relying on freq_ctr
which are defined in freq_ctr.h which itself includes time.h. It's
barely possible to touch any of these files without breaking all the
circular dependency.

Let's move all this stuff to activity.{c,h} and be done with it. The
measurement of active and stolen time is now done in a dedicated
function called just after tv_before_poll() instead of mixing the two,
which used to be a lazy (but convenient) decision.

No code was changed, stuff was just moved around.
2018-11-22 11:48:41 +01:00
Willy Tarreau
7e9c4ae4de MINOR: poller: move time and date computation out of the pollers
By placing this code into time.h (tv_entering_poll() and tv_leaving_poll())
we can remove the logic from the pollers and prepare for extending this to
offer more accurate time measurements.
2018-10-17 19:59:43 +02:00
Willy Tarreau
f37ba94768 MINOR: fd: centralize poll timeout computation in compute_poll_timeout()
The 4 pollers all contain the same code used to compute the poll timeout.
This is pointless, let's centralize this into fd.h. This also gets rid of
the useless SCHEDULER_RESOLUTION macro which used to work arond a very old
linux 2.2 bug causing select() to wake up slightly before the timeout.
2018-10-17 19:59:43 +02:00
Willy Tarreau
60b639ccbe MEDIUM: hathreads: implement a more flexible rendez-vous point
The current synchronization point enforces certain restrictions which
are hard to workaround in certain areas of the code. The fact that the
critical code can only be called from the sync point itself is a problem
for some callback-driven parts. The "show fd" command for example is
fragile regarding this.

Also it is expensive in terms of CPU usage because it wakes every other
thread just to be sure all of them join to the rendez-vous point. It's a
problem because the sleeping threads would not need to be woken up just
to know they're doing nothing.

Here we implement a different approach. We keep track of harmless threads,
which are defined as those either doing nothing, or doing harmless things.
The rendez-vous is used "for others" as a way for a thread to isolate itself.
A thread then requests to be alone using thread_isolate() when approaching
the dangerous area, and then waits until all other threads are either doing
the same or are doing something harmless (typically polling). The function
only returns once the thread is guaranteed to be alone, and the critical
section is terminated using thread_release().
2018-08-02 17:51:45 +02:00
Olivier Houchard
cb92f5cae4 MINOR: pollers: move polled_mask outside of struct fdtab.
The polled_mask is only used in the pollers, and removing it from the
struct fdtab makes it fit in one 64B cacheline again, on a 64bits machine,
so make it a separate array.
2018-05-06 06:27:34 +02:00
Olivier Houchard
6b96f7289c BUG/MEDIUM: pollers: Use a global list for fd shared between threads.
With the old model, any fd shared by multiple threads, such as listeners
or dns sockets, would only be updated on one threads, so that could lead
to missed event, or spurious wakeups.
To avoid this, add a global list for fd that are shared, using the same
implementation as the fd cache, and only remove entries from this list
when every thread as updated its poller.

[wt: this will need to be backported to 1.8 but differently so this patch
 must not be backported as-is]
2018-05-06 06:27:09 +02:00
Olivier Houchard
8ef1a6b0d8 BUG/MINOR: fd: Don't clear the update_mask in fd_insert.
Clearing the update_mask bit in fd_insert may lead to duplicate insertion
of fd in fd_updt, that could lead to a write past the end of the array.
Instead, make sure the update_mask bit is cleared by the pollers no matter
what.

This should be backported to 1.8.
[wt: warning: 1.8 doesn't have the lockless fdcache changes and will
 require some careful changes in the pollers]
2018-04-03 19:38:15 +02:00
Willy Tarreau
62a627ac19 MEDIUM: poller: use atomic ops to update the fdtab mask
We don't need to lock the fdtab[].lock anymore since we only have one
modification left (update update_mask). Let's use an atomic AND instead.
2018-02-05 16:02:22 +01:00
Willy Tarreau
038e54cb3c MINOR: epoll: get rid of the now useless fd_compute_new_polled_status()
Do not call it anymore and avoid updating the fdstate. We're not very far
from removing the fd lock it seems.
2018-02-05 16:02:22 +01:00
Willy Tarreau
4979592907 BUG/MINOR: epoll/threads: only call epoll_ctl(DEL) on polled FDs
Commit d9e7e36 ("BUG/MEDIUM: epoll/threads: use one epoll_fd per thread")
addressed an issue with the polling and required that cloned FDs are removed
from all polling threads on close. But in fact it does it for all bound
threads, some of which may not necessarily poll the FD. This is harmless,
but it may also make it harder later to deal with FD migration between
threads. Better use polled_mask which only reports threads still aware
of the FD instead of thread_mask.

This fix should be backported to 1.8.
2018-01-31 09:49:29 +01:00
Willy Tarreau
745c60eac6 CLEANUP: fd: remove the unused "new" field
This field has been unused since 1.6, it's only updated and never
tested. Let's remove it.
2018-01-29 16:02:59 +01:00
Willy Tarreau
ce036bc2da MINOR: polling: make epoll and kqueue not depend on maxfd anymore
Maxfd is really only useful to poll() and select(), yet epoll and
kqueue reference it almost by mistake :
  - cloning of the initial FDs (maxsock should be used here)
  - max polled events, it's maxpollevents which should be used here.

Let's fix these places.
2018-01-29 15:18:54 +01:00
Christopher Faulet
3e805ed08e BUILD: epoll/threads: Add test on MAX_THREADS to avoid warnings when complied without threads
When HAProxy is complied without threads, gcc throws following warnings:

  src/ev_epoll.c:222:3: warning: array subscript is outside array bounds [-Warray-bounds]
  ...
  src/ev_epoll.c:199:11: warning: array subscript is outside array bounds [-Warray-bounds]
  ...

Of course, this is not a bug. In such case, tid is always equal to 0. But to
avoid the noise, a check on MAX_THREADS in "if (tid)" lines makes gcc happy.

This patch should be backported in 1.8 with the commit d9e7e36c ("BUG/MEDIUM:
epoll/threads: use one epoll_fd per thread").
2018-01-25 17:52:57 +01:00
Willy Tarreau
d9e7e36c6e BUG/MEDIUM: epoll/threads: use one epoll_fd per thread
There currently is a problem regarding epoll(). While select() and poll()
compute their polling state on the fly upon each call, epoll() keeps a
shared state between all threads via the epoll_fd. The problem is that
once an fd is registered on *any* thread, all other threads receive
events for that FD as well. It is clearly visible when binding a listener
to a single thread like in the configuration below where all 4 threads
will work, 3 of them simply spinning to skip the event :

    global
        nbthread 4

    frontend foo
        bind :1234 process 1/1

The worst case happens when some slow operations are in progress on a
busy thread, preventing it from processing its task and causing the
other ones to wake up not being able to do anything with this event.
Typically computing a large TLS key will delay processing of next
events on the same thread while others will still wake up.

All this simply shows that the poller must remain thread-specific, with
its own events and its own ability to sleep when it doesn't have anyhing
to do.

This patch does exactly this. For this, it proceeds like this :

   - have one epoll_fd per thread instead of one per process
   - initialize these epoll_fd when threads are created.
   - mark all known FDs as updated so that the next invocation of
     _do_poll() recomputes their polling status (including a possible
     removal of undesired polling from the original FD) ;
   - use each fd's polled_mask to maintain an accurate status of
     the current polling activity for this FD.
   - when scanning updates, only focus on events whose new polling
     status differs from the existing one
   - during updates, always verify the thread_mask to resist migration
   - on __fd_clo(), for cloned FDs (typically listeners inherited
     from the parent during a graceful shutdown), run epoll_ctl(DEL)
     on all epoll_fd. This is the reason why epoll_fd is stored in a
     shared array and not in a thread_local storage. Note: maybe this
     can be moved to an update instead.

Interestingly, this shows that we don't need the FD's old state anymore
and that we only use it to convert it to the new state based on stable
information. It appears clearly that the FD code can be further improved
by computing the final state directly when manipulating it.

With this change, the config above goes from 22000 cps at 380% CPU to
43000 cps at 100% CPU : not only the 3 unused threads are not activated,
but they do not disturb the activity anymore.

The output of "show activity" before and after the patch on a 4-thread
config where a first listener on thread 2 forwards over SSL to threads
3 & 4 shows this a much smaller amount of undesired events (thread 1
doesn't wake up anymore, poll_skip remains zero, fd_skip stays low) :

  // before: 400% CPU, 7700 cps, 13 seconds
  loops: 11380717 65879 5733468 5728129
  wake_cache: 0 63986 317547 314174
  wake_tasks: 0 0 0 0
  wake_applets: 0 0 0 0
  wake_signal: 0 0 0 0
  poll_exp: 0 63986 317547 314174
  poll_drop: 1 0 49981 48893
  poll_dead: 65514 0 31334 31934
  poll_skip: 46293690 34071 22867786 22858208
  fd_skip: 66068135 174157 33732685 33825727
  fd_lock: 0 2 2809 2905
  fd_del: 0 494361 80890 79464
  conn_dead: 0 0 0 0
  stream: 0 407747 50526 49474
  empty_rq: 11380718 1914 5683023 5678715
  long_rq: 0 0 0 0

  // after: 200% cpu, 9450 cps, 11 seconds
  loops: 17 66147 1001631 450968
  wake_cache: 0 66119 865139 321227
  wake_tasks: 0 0 0 0
  wake_applets: 0 0 0 0
  wake_signal: 0 0 0 0
  poll_exp: 0 66119 865139 321227
  poll_drop: 6 5 38279 60768
  poll_dead: 0 0 0 0
  poll_skip: 0 0 0 0
  fd_skip: 54 172661 4411407 2008198
  fd_lock: 0 0 10890 5394
  fd_del: 0 492829 58965 105091
  conn_dead: 0 0 0 0
  stream: 0 406223 38663 61338
  empty_rq: 18 40 962999 390549
  long_rq: 0 0 0 0

This patch presents a few risks but fixes a real problem with threads,
and as such it needs be backported to 1.8. It depends on previous patch
("MINOR: fd: add a bitmask to indicate that an FD is known by the poller").

Special thanks go to Samuel Reed for providing a large amount of useful
debugging information and for testing fixes.
2018-01-23 15:48:08 +01:00
Willy Tarreau
ebc78d78a2 BUG/MEDIUM: fd: maintain a per-thread update mask
Since the fd update tables are per-thread, we need to have a bit per
thread to indicate whether an update exists, otherwise this can lead
to lost update events every time multiple threads want to update the
same FD. In practice *for now*, it only happens at start time when
listeners are enabled and ask for polling after facing their first
EAGAIN. But since the pollers are still shared, a lost event is still
recovered by a neighbor thread. This will not reliably work anymore
with per-thread pollers, where it has been observed a few times on
startup that a single-threaded listener would not always accept
incoming connections upon startup.

It's worth noting that during this code review it appeared that the
"new" flag in the fdtab isn't used anymore.

This fix should be backported to 1.8.
2018-01-23 15:41:19 +01:00
Willy Tarreau
d80cb4ee13 MINOR: global: add some global activity counters to help debugging
A number of counters have been added at special places helping better
understanding certain bug reports. These counters are maintained per
thread and are shown using "show activity" on the CLI. The "clear
counters" commands also reset these counters. The output is sent as a
single write(), which currently produces up to about 7 kB of data for
64 threads. If more counters are added, it may be necessary to write
into multiple buffers, or to reset the counters.

To backport to 1.8 to help collect more detailed bug reports.
2018-01-23 15:38:33 +01:00
Christopher Faulet
2a944ee16b BUILD: threads: Rename SPIN/RWLOCK macros using HA_ prefix
This remove any name conflicts, especially on Solaris.
2017-11-07 11:10:24 +01:00
Willy Tarreau
f65610a83d CLEANUP: threads: rename process_mask to thread_mask
It was a leftover from the last cleaning session; this mask applies
to threads and calling it process_mask is a bit confusing. It's the
same in fd, task and applets.
2017-10-31 16:06:06 +01:00
Christopher Faulet
cd7879adc2 BUG/MEDIUM: threads: Run the poll loop on the main thread too
There was a flaw in the way the threads was created. the main one was just used
to create all the others and just wait to exit. Now, it is used to run a poll
loop. So we only create nbthread-1 threads.

This also fixes a bug about the compression filter when there is only 1 thread
(nbthread == 1 or no threads support). The bug was in the way thread-local
resources was initialized. per-thread init/deinit callbacks were never called
for the main process. So, with nthread set to 1, some buffers remained
uninitialized.
2017-10-31 13:58:33 +01:00