384 Commits

Author SHA1 Message Date
Amaury Denoyelle
1a58aca84e MINOR: connection: use the srv pointer for the srv conn hash
The pointer of the target server is used as a first parameter for the
server connection hash calcul. This prevents the hash to be null when no
specific parameters are present, and can serve as a simple defense
against an attacker trying to reuse a non-conform connection.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
81c6f76d3e MINOR: connection: prepare hash calcul for server conns
This is a preliminary work for the calcul of the backend connection
hash. A structure conn_hash_params is the input for the operation,
containing the various specific parameters of a connection.

The high bits of the hash will reflect the parameters present as input.
A set of macros is written to manipulate the connection hash and extract
the parameters/payload.
2021-02-12 12:33:05 +01:00
Amaury Denoyelle
f232cb3e9b MEDIUM: connection: replace idle conn lists by eb trees
The server idle/safe/available connection lists are replaced with ebmb-
trees. This is used to store backend connections, with the new field
connection hash as the key. The hash is a 8-bytes size field, used to
reflect specific connection parameters.

This is a preliminary work to be able to reuse connection with SNI,
explicit src/dst address or PROXY protocol.
2021-02-12 12:33:05 +01:00
Willy Tarreau
746b0515a4 MEDIUM: connection: make use of the control layer check_events/ignore_events
This changes the subscribe/unsubscribe functions to rely on the control
layer's check_events/ignore_events. At the moment only the socket version
of these functions is present so the code should basically be the same.
2020-12-11 17:06:11 +01:00
Willy Tarreau
2ded48dd27 MINOR: connection: make conn_sock_drain() use the control layer's ->drain()
Now we don't touch the fd anymore there, instead we rely on the ->drain()
provided by the control layer. As such the function was renamed to
conn_ctrl_drain().
2020-12-11 16:26:01 +01:00
Willy Tarreau
586f71b43f REORG: connection: move the socket iocb (conn_fd_handler) to sock.c
conn_fd_handler() is 100% specific to socket code. It's about time
it moves to sock.c which manipulates socket FDs. With it comes
conn_fd_check() which tests for the socket's readiness. The ugly
connection status check at the end of the iocb was moved to an inlined
function in connection.h so that if we need it for other socket layers
it's not too hard to reuse.

The code was really only moved and not changed at all.
2020-12-11 16:26:00 +01:00
Willy Tarreau
827fee7406 MINOR: connection: remove sock-specific code from conn_sock_send()
The send() loop present in this function and the error handling is already
present in raw_sock_from_buf(). Let's rely on it instead and stop touching
the FD from this place. The send flag was changed to use a more agnostic
CO_SFL_*. The name was changed to "conn_ctrl_send()" to remind that it's
meant to be used to send at the lowest level.
2020-12-11 16:25:11 +01:00
Willy Tarreau
8b250ba738 CLEANUP: connection: open-code conn_cond_update_polling() and update the comment
This last call to conn_cond_update_polling() is now totally misleading as
the function only stops polling in case of unrecoverable connection error.
Let's open-code the test to make it more prominent and explain what we're
trying to do there. It's even almost certain this code is never executed
anymore, as the only remaining case should be a mux's wake function setting
CO_FL_ERROR without disabling the polling, but they need to be audited first
to make sure this is the case.
2020-12-11 11:19:24 +01:00
Willy Tarreau
5a1d439225 CLEANUP: connection: use fd_stop_both() instead of conn_stop_polling()
conn_stop_polling() in fact only calls fd_stop_both() after checking
that the ctrl layer is ready. It's the case in conn_fd_check() so
let's get rid of this next-to-last user of this function.
2020-12-11 09:56:53 +01:00
Willy Tarreau
38b4d2eb22 CLEANUP: connection: do not use conn->owner when the session is known
At a few places we used to rely on conn->owner to retrieve the session
while the session is already known. This is not correct because at some
of these points the reason the connection's owner was still the session
(instead of NULL) is a mistake. At one place a comparison is even made
between the session and conn->owner assuming it's valid without checking
if it's NULL. Let's clean this up to use the session all the time.

Note that this will be needed for a forthcoming fix and will have to be
backported.
2020-11-21 15:29:22 +01:00
Willy Tarreau
9b7587a6af MINOR: connection: make sockaddr_alloc() take the address to be copied
Roughly half of the calls to sockadr_alloc() are made to copy an already
known address. Let's optionally pass it in argument so that the function
can handle the copy at the same time, this slightly simplifies its usage.
2020-10-15 21:47:56 +02:00
Willy Tarreau
e53e7ec9d9 CLEANUP: protocol: remove the ->drain() function
No protocol defines it anymore. The last user used to be the monitor-net
stuff that got partially broken already when the tcp_drain() function
moved to conn_sock_drain() with commit e215bba95 ("MINOR: connection:
make conn_sock_drain() work for all socket families") in 1.9-dev2.

A part of this will surely move back later when non-socket connections
arrive with QUIC but better keep the API clean and implement what's
needed in time instead.
2020-10-15 21:47:04 +02:00
Ilya Shipitsin
6b79f38a7a CLEANUP: assorted typo fixes in the code and comments
This is 12th iteration of typo fixes
2020-07-31 11:18:07 +02:00
Christopher Faulet
08016ab82d MEDIUM: connection: Add private connections synchronously in session server list
When a connection is marked as private, it is now added in the session server
list. We don't wait a stream is detached from the mux to do so. When the
connection is created, this happens after the mux creation. Otherwise, it is
performed when the connection is marked as private.

To allow that, when a connection is created, the session is systematically set
as the connectin owner. Thus, a backend connection has always a owner during its
creation. And a private connection has always a owner until its death.

Note that outside the detach() callback, if the call to session_add_conn()
failed, the error is ignored. In this situation, we retry to add the connection
into the session server list in the detach() callback. If this fails at this
step, the multiplexer is destroyed and the connection is closed.
2020-07-15 14:08:14 +02:00
Christopher Faulet
2883fcf65b BUG/MINOR: connection: See new connection as available only on reuse always
When the multiplexer creation is delayed after the handshakes phase, the
connection is added in the available connection list if http-reuse never is not
configured for the backend. But it is a wrong statement. At this step, the
connection is not safe because it is a new connection. So it must be added in
the available connection list only if http-reuse always is used.

No backport needed, this is 2.2-dev.
2020-07-07 14:31:01 +02:00
Christopher Faulet
aa27853ce2 BUG/MEDIUM: connection: Don't consider new private connections as available
When a connection is created and the multiplexer is installed, if the connection
is marked as private, don't consider it as available, regardless the number of
available streams. This test is performed when the mux is installed when the
connection is created, in connect_server(), and when the mux is installed after
the handshakes stage.

No backport needed, this is 2.2-dev.
2020-07-07 14:30:38 +02:00
Willy Tarreau
4d82bf5c2e MINOR: connection: align toremove_{lock,connections} and cleanup into idle_conns
We used to have 3 thread-based arrays for toremove_lock, idle_cleanup,
and toremove_connections. The problem is that these items are small,
and that this creates false sharing between threads since it's possible
to pack up to 8-16 of these values into a single cache line. This can
cause real damage where there is contention on the lock.

This patch creates a new array of struct "idle_conns" that is aligned
on a cache line and which contains all three members above. This way
each thread has access to its variables without hindering the other
ones. Just doing this increased the HTTP/1 request rate by 5% on a
16-thread machine.

The definition was moved to connection.{c,h} since it appeared a more
natural evolution of the ongoing changes given that there was already
one of them declared in connection.h previously.
2020-06-28 10:52:36 +02:00
Willy Tarreau
4cabfc18a3 BUG/MAJOR: connection: always disable ready events once reported
This effectively reverts the two following commits:

  6f95f6e11 ("OPTIM: connection: disable receiving on disabled events when the run queue is too high")
  065a02561 ("MEDIUM: connection: don't stop receiving events in the FD handler")

The problem as reported in issue #662 is that when the events signals
the readiness of input data that has to be forwarded over a congested
stream, the mux will read data and wake the stream up to forward them,
but the buffer full condition makes this impossible immediately, then
nobody in the chain will be able to disable the event after it was
first reported. And given we don't know at the connection level whether
an event was already reported or not, we can't decide anymore to
forcefully stop it if for any reason its processing gets delayed.

The problem is magnified in issue #662 by the fact that a shutdown is
reported with pending data occupying the buffer. The shutdown will
strike in loops and cause the upper layer stream to be notified until
it's handled, but with a buffer full it's not possible to call cs_recv()
hence to purge the event.

All this can only be handled optimally by implementing a lower layer,
direct mux-to-mux forwarding that will not require any scheduling. This
was no wake up will be needed and the event will be instantly handled
or paused for a long time.

For now let's simply revert these optimizations. Running a 1 MB transfer
test over H2 using 8 connections having each 32 streams with a limited
link of 320 Mbps shows the following profile before this fix:

   calls  syscall       (100% CPU)
  ------  -------
  259878  epoll_wait
  519759  clock_gettime
   17277  sendto
   17129  recvfrom
     672  epoll_ctl

And the following one after the fix:

   calls  syscall       (2-3% CPU)
  ------  -------
   17201  sendto
   17357  recvfrom
    2304  epoll_wait
    4609  clock_gettime
    1200  epoll_ctl

Thus the behavior is much better.

No backport is needed as these patches were only in 2.2-dev.

Many thanks to William Dauchy for reporting a lot of details around this
difficult issue.
2020-06-17 17:00:51 +02:00
Willy Tarreau
b2551057af CLEANUP: include: tree-wide alphabetical sort of include files
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
2020-06-11 10:18:59 +02:00
Willy Tarreau
36979d9ad5 REORG: include: move the error reporting functions to from log.h to errors.h
Most of the files dealing with error reports have to include log.h in order
to access ha_alert(), ha_warning() etc. But while these functions don't
depend on anything, log.h depends on a lot of stuff because it deals with
log-formats and samples. As a result it's impossible not to embark long
dependencies when using ha_warning() or qfprintf().

This patch moves these low-level functions to errors.h, which already
defines the error codes used at the same places. About half of the users
of log.h could be adjusted, sometimes revealing other issues such as
missing tools.h. Interestingly the total preprocessed size shrunk by
4%.
2020-06-11 10:18:59 +02:00
Willy Tarreau
6be7849f39 REORG: include: move cfgparse.h to haproxy/cfgparse.h
There's no point splitting the file in two since only cfgparse uses the
types defined there. A few call places were updated and cleaned up. All
of them were in C files which register keywords.

There is nothing left in common/ now so this directory must not be used
anymore.
2020-06-11 10:18:58 +02:00
Willy Tarreau
5e539c9b8d REORG: include: move stream_interface.h to haproxy/stream_interface{,-t}.h
Almost no changes, removed stdlib and added buf-t and connection-t to
the types to avoid a warning.
2020-06-11 10:18:58 +02:00
Willy Tarreau
209108dbbd REORG: include: move ssl_sock.h to haproxy/ssl_sock{,-t}.h
Almost nothing changed, just moved a static inline at the end and moved
an export from the types to the main file.
2020-06-11 10:18:58 +02:00
Willy Tarreau
7ea393d95e REORG: include: move connection.h to haproxy/connection{,-t}.h
The type file is becoming a mess, half of it is for the proxy protocol,
another good part describes conn_streams and mux ops, it would deserve
being split again. At least it was reordered so that elements are easier
to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved
to compat.h as it's said to be the value for Linux.
2020-06-11 10:18:58 +02:00
Willy Tarreau
fc77454aff REORG: include: move proto_tcp.h to haproxy/proto_tcp.h
There was no type file. This one really is trivial. A few missing
includes were added to satisfy the exported functions prototypes.
2020-06-11 10:18:58 +02:00
Willy Tarreau
e6ce10be85 REORG: include: move sample.h to haproxy/sample{,-t}.h
This one is particularly tricky to move because everyone uses it
and it depends on a lot of other types. For example it cannot include
arg-t.h and must absolutely only rely on forward declarations to avoid
dependency loops between vars -> sample_data -> arg. In order to address
this one, it would be nice to split the sample_data part out of sample.h.
2020-06-11 10:18:58 +02:00
Willy Tarreau
762d7a5117 REORG: include: move frontend.h to haproxy/frontend.h
There was no type file for this one, it only contains frontend_accept().
2020-06-11 10:18:57 +02:00
Willy Tarreau
0f6ffd652e REORG: include: move fd.h to haproxy/fd{,-t}.h
A few includes were missing in each file. A definition of
struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was
moved to defaults.h

Stdio used to be silently inherited from whatever path but it's needed
for list_pollers() which takes a FILE* and which can thus not be
forward-declared.
2020-06-11 10:18:57 +02:00
Willy Tarreau
7a00efbe43 REORG: include: move common/namespace.h to haproxy/namespace{,-t}.h
The type was moved out as it's used by standard.h for netns_entry.
Instead of just being a forward declaration when not used, it's an
empty struct, which makes gdb happier (the resulting stripped executable
is the same).
2020-06-11 10:18:57 +02:00
Willy Tarreau
6131d6a731 REORG: include: move common/net_helper.h to haproxy/net_helper.h
No change was necessary.
2020-06-11 10:18:57 +02:00
Willy Tarreau
58017eef3f REORG: include: move the BUG_ON() code to haproxy/bug.h
This one used to be stored into debug.h but the debug tools got larger
and require a lot of other includes, which can't use BUG_ON() anymore
because of this. It does not make sense and instead this macro should
be placed into the lower includes and given its omnipresence, the best
solution is to create a new bug.h with the few surrounding macros needed
to trigger bugs and place assertions anywhere.

Another benefit is that it won't be required to add include <debug.h>
anymore to use BUG_ON, it will automatically be covered by api.h. No
less than 32 occurrences were dropped.

The FSM_PRINTF macro was dropped since not used at all anymore (probably
since 1.6 or so).
2020-06-11 10:18:56 +02:00
Willy Tarreau
8d36697dee REORG: include: move base64.h, errors.h and hash.h from common to to haproxy/
These ones do not depend on any other file. One used to include
haproxy/api.h but that was solely for stddef.h.
2020-06-11 10:18:56 +02:00
Willy Tarreau
4c7e4b7738 REORG: include: update all files to use haproxy/api.h or api-t.h if needed
All files that were including one of the following include files have
been updated to only include haproxy/api.h or haproxy/api-t.h once instead:

  - common/config.h
  - common/compat.h
  - common/compiler.h
  - common/defaults.h
  - common/initcall.h
  - common/tools.h

The choice is simple: if the file only requires type definitions, it includes
api-t.h, otherwise it includes the full api.h.

In addition, in these files, explicit includes for inttypes.h and limits.h
were dropped since these are now covered by api.h and api-t.h.

No other change was performed, given that this patch is large and
affects 201 files. At least one (tools.h) was already freestanding and
didn't get the new one added.
2020-06-11 10:18:42 +02:00
Christopher Faulet
3ab504f5ff BUG/MEDIUM: connection: Ignore PP2 unique ID for stream-less connections
It is possible to send a unique ID when the PROXY protocol v2 is used. It relies
on the stream to do so. So we must be sure to have a stream. Locally initiated
connections may not be linked to a stream. For instance, outgoing connections
created by health checks have no stream. Moreover, the stream is not retrieved
for mux-less connections (this bug will be fixed in another commit).

Unfortunately, in make_proxy_line_v2() function, the stream is not tested before
generating the unique-id.  This bug leads to a segfault when a health check is
performed for a server with the PROXY protocol v2 and the unique-id option
enabled. It also crashes for servers using SSL connections with alpn. The bug
was introduced by the commit cf6e0c8a8 ("MEDIUM: proxy_protocol: Support sending
unique IDs using PPv2")

This patch should fix the issue #640. It must be backported to the same versions
as the commit above.
2020-05-26 17:36:01 +02:00
Willy Tarreau
119e50e0cc MINOR: connection: add pp2-never-send-local to support old PP2 behavior
A bug in the PROXY protocol v2 implementation was present in HAProxy up to
version 2.1, causing it to emit a PROXY command instead of a LOCAL command
for health checks. This is particularly minor but confuses some servers'
logs. Sadly, the bug was discovered very late and revealed that some servers
which possibly only tested their PROXY protocol implementation against
HAProxy fail to properly handle the LOCAL command, and permanently remain in
the "down" state when HAProxy checks them. When this happens, it is possible
to enable this global option to revert to the older (bogus) behavior for the
time it takes to contact the affected components' vendors and get them fixed.
This option is disabled by default and acts on all servers having the
"send-proxy-v2" statement.

Older versions were reverted to the old behavior and should not attempt to
be fixed by default again. However a variant of this patch could possibly
be implemented to ask to explicitly send LOCAL if needed by some servers.

More context here:
   https://www.mail-archive.com/haproxy@formilux.org/msg36890.html
   https://www.mail-archive.com/haproxy@formilux.org/msg37218.html
2020-05-22 13:55:32 +02:00
Christopher Faulet
14cd316a1f MAJOR: checks: Use the best mux depending on the protocol for health checks
When a tcp-check connect rule is evaluated, the mux protocol corresponding to
the health-check is chosen. So for TCP based health-checks, the mux-pt is
used. For HTTP based health-checks, the mux-h1 is used. The connection is marked
as private to be sure to not ruse regular HTTP connection for
health-checks. Connections reuse will be evaluated later.

The functions evaluating HTTP send rules and expect rules have been updated to
be HTX compliant. The main change for users is that HTTP health-checks are now
stricter on the HTTP message format. While before, the HTTP formatting and
parsing were minimalist, now messages should be well formatted.
2020-04-27 10:41:07 +02:00
Willy Tarreau
02c88036a6 BUG/MINOR: connection: always send address-less LOCAL PROXY connections
Commit 7f26391bc5 ("BUG/MINOR: connection: make sure to correctly tag
local PROXY connections") revealed that some implementations do not
properly ignore addresses in LOCAL connections (at least Dovecot was
spotted). More context information in the thread below:

   https://www.mail-archive.com/haproxy@formilux.org/msg36890.html

The patch above was using LOCAL on top of local addresses in order to
minimize the risk of breakage but revealed worse than a clean fix. So
let's partially revert it and send pure LOCAL connections instead now.

After a bit of observation, this patch should be progressively backported
to stable branches. However if it reveals new breakage, the backport of
the patch above will have to be reverted from stable branches while other
products work on fixing their code based on the master branch.
2020-04-14 16:02:50 +02:00
Ilya Shipitsin
ce7b00f926 CLEANUP: assorted typo fixes in the code and comments
This is fifth iteration of typo fixes
2020-03-31 17:09:35 +02:00
Olivier Houchard
f0d4dff25c MINOR: connections: Make the "list" element a struct mt_list instead of list.
Make the "list" element a struct mt_list, and explicitely use
list_from_mt_list to get a struct list * where it is used as such, so that
mt_list_for_each_entry will be usable with it.
2020-03-19 22:07:33 +01:00
Olivier Houchard
dc2f2753e9 MEDIUM: servers: Split the connections into idle, safe, and available.
Revamp the server connection lists. We know have 3 lists :
- idle_conns, which contains idling connections
- safe_conns, which contains idling connections that are safe to use even
for the first request
- available_conns, which contains connections that are not idling, but can
still accept new streams (those are HTTP/2 or fastcgi, and are always
considered safe).
2020-03-19 22:07:33 +01:00
Tim Duesterhus
2b7f6c22d8 CLEANUP: connection: Stop directly setting an ist's .ptr
Instead replace the complete `ist` by the value returned from `ist2`.

This was noticed during review of issue #549.
2020-03-14 18:31:58 +01:00
Tim Duesterhus
a8692f3fe0 CLEANUP: connection: Add blank line after declarations in PP handling
This adds the missing blank lines in `make_proxy_line_v2` and
`conn_recv_proxy`. It also adjusts the type of the temporary variable
used for the return value of `recv` to be `ssize_t` instead of `int`.
2020-03-13 17:26:43 +01:00
Tim Duesterhus
cf6e0c8a83 MEDIUM: proxy_protocol: Support sending unique IDs using PPv2
This patch adds the `unique-id` option to `proxy-v2-options`. If this
option is set a unique ID will be generated based on the `unique-id-format`
while sending the proxy protocol v2 header and stored as the unique id for
the first stream of the connection.

This feature is meant to be used in `tcp` mode. It works on HTTP mode, but
might result in inconsistent unique IDs for the first request on a keep-alive
connection, because the unique ID for the first stream is generated earlier
than the others.

Now that we can send unique IDs in `tcp` mode the `%ID` log variable is made
available in TCP mode.
2020-03-13 17:26:43 +01:00
Tim Duesterhus
d1b15b6e9b MINOR: proxy_protocol: Ingest PP2_TYPE_UNIQUE_ID on incoming connections
This patch reads a proxy protocol v2 provided unique ID and makes it
available using the `fc_pp_unique_id` fetch.
2020-03-13 17:25:23 +01:00
Tim Duesterhus
ba837ec367 CLEANUP: proxy_protocol: Use size_t when parsing TLVs
Change `int` to `size_t` for consistency.
2020-03-06 11:16:19 +01:00
Tim Duesterhus
488ee7fb6e BUG/MAJOR: proxy_protocol: Properly validate TLV lengths
This patch fixes PROXYv2 parsing when the payload of the TCP connection is
fused with the PROXYv2 header within a single recv() call.

Previously HAProxy ignored the PROXYv2 header length when attempting to
parse the TLV, possibly interpreting the first byte of the payload as a
TLV type.

This patch adds proper validation. It ensures that:

1. TLV parsing stops when the end of the PROXYv2 header is reached.
2. TLV lengths cannot exceed the PROXYv2 header length.
3. The PROXYv2 header ends together with the last TLV, not allowing for
   "stray bytes" to be ignored.

A reg-test was added to ensure proper behavior.

This patch tries to find the sweat spot between a small and easily
backportable one, and a cleaner one that's more easily adaptable to
older versions, hence why it merges the "if" and "while" blocks which
causes a reindent of the whole block. It should be used as-is for
versions 1.9 to 2.1, the block about PP2_TYPE_AUTHORITY should be
dropped for 2.0 and the block about CRC32C should be dropped for 1.8.

This bug was introduced when TLV parsing was added. This happened in commit
b3e54fe387c7c1ea750f39d3029672d640c499f9. This commit was first released
with HAProxy 1.6-dev1.

A similar issue was fixed in commit 7209c204bd6f3c49132264c7a58f689cdc741c12.

This patch must be backported to HAProxy 1.6+.
2020-03-06 11:11:22 +01:00
Willy Tarreau
6f95f6e111 OPTIM: connection: disable receiving on disabled events when the run queue is too high
In order to save a lot on syscalls, we currently don't disable receiving
on a file descriptor anymore if its handler was already woken up. But if
the run queue is huge and the poller collects a lot of events, this causes
excess wakeups which take CPU time which is not used to flush these tasklets.
This patch simply considers the run queue size to decide whether or not to
stop receiving. Tests show that by stopping receiving when the run queue
reaches ~16 times its configured size, we can still hold maximal performance
in extreme situations like maxpollevents=20k for runqueue_depth=2, and still
totally avoid calling epoll_event under moderate load using default settings
on keep-alive connections.
2020-03-04 19:29:12 +01:00
Willy Tarreau
8de5c4fa15 MEDIUM: connection: only call ->wake() for connect() without I/O
We used to call ->wake() for any I/O event for which there was no
subscriber. But this is a problem because this causes massive wake()
storms since we disabled fd_stop_recv() to save syscalls.

The only reason for the io_available condition is to detect that an
asynchronous connect() just finished and will not be handled by any
registered event handler. Since we now properly handle synchronous
connects, we can detect this situation by the fact that we had a
success on conn_fd_check() and no requested I/O took over.
2020-03-04 19:29:12 +01:00
Willy Tarreau
667fefdc90 BUG/MEDIUM: connection: stop polling for sending when the event is ready
With commit 065a025610 ("MEDIUM: connection: don't stop receiving events
in the FD handler") we disabled a number of fd_stop_* in conn_fd_handler(),
in order to wait for their respective handlers to deal with them. But it
is not correct to do that for the send direction, as we may very well
have nothing to send. This is visible when connecting in TCP mode to
a server with no data to send, there's nobody anymore to disable the
polling for the send direction.

And it is logical, on the recv() path we know the system has data to
deliver and that some code will be in charge of it. On the send
direction we simply don't know if it was the result of a successful
connect() or if there is still something to send. In any case we
almost never fill the network buffer on a single send() after being
woken up by the system, so disabling the FD immediately or much later
will not change the number of operations.

No backport is needed, this is 2.2-dev.
2020-03-04 19:29:12 +01:00
Willy Tarreau
065a025610 MEDIUM: connection: don't stop receiving events in the FD handler
The remaining epoll_ctl() calls are exclusively caused by the disagreement
between conn_fd_handler() and the mux receiving the data: the fd handler
wants to stop after having woken up the tasklet, then the mux after
receiving data wants to receive again. Given that they don't happen in
the same poll loop when there are many FDs, this causes a lot of state
changes.

As suggested by Olivier, if the task is already scheduled for running,
we don't need to disable the event because it's in the run queue, poll()
cannot stop, and reporting it again will be harmless. What *might*
happen however is that a sampling-based poller like epoll() would report
many times the same event and has trouble getting others behind. But if
it would happen, it would still indicate the run queue has plenty of
pending operations, so it would in fact only displace the problem from
the poller to the run queue, which doesn't seem to be worse (and in
fact we do support priorities while the poller does not).

By doing this change, the keep-alive test with 1k conns and 100k reqs
completely gets rid of the per-request epoll_ctl changes, while still
not causing extra recvfrom() :

  $ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  200000 sendto 1
  200000 recvfrom 1
   10762 epoll_wait 1
    3664 epoll_ctl 1
    1999 recvfrom -1

In close mode, it didn't change anything, we're still in the optimal
case (2 epoll per connection) :

  $ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20

  203764 epoll_ctl 1
  200000 sendto 1
  200000 recvfrom 1
    6091 epoll_wait 1
    2994 recvfrom -1
2020-02-28 16:17:09 +01:00