haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-29 07:31:00 +01:00

Author	SHA1	Message	Date
Amaury Denoyelle	1a58aca84e	MINOR: connection: use the srv pointer for the srv conn hash The pointer of the target server is used as a first parameter for the server connection hash calcul. This prevents the hash to be null when no specific parameters are present, and can serve as a simple defense against an attacker trying to reuse a non-conform connection.	2021-02-12 12:33:05 +01:00
Amaury Denoyelle	81c6f76d3e	MINOR: connection: prepare hash calcul for server conns This is a preliminary work for the calcul of the backend connection hash. A structure conn_hash_params is the input for the operation, containing the various specific parameters of a connection. The high bits of the hash will reflect the parameters present as input. A set of macros is written to manipulate the connection hash and extract the parameters/payload.	2021-02-12 12:33:05 +01:00
Amaury Denoyelle	f232cb3e9b	MEDIUM: connection: replace idle conn lists by eb trees The server idle/safe/available connection lists are replaced with ebmb- trees. This is used to store backend connections, with the new field connection hash as the key. The hash is a 8-bytes size field, used to reflect specific connection parameters. This is a preliminary work to be able to reuse connection with SNI, explicit src/dst address or PROXY protocol.	2021-02-12 12:33:05 +01:00
Willy Tarreau	746b0515a4	MEDIUM: connection: make use of the control layer check_events/ignore_events This changes the subscribe/unsubscribe functions to rely on the control layer's check_events/ignore_events. At the moment only the socket version of these functions is present so the code should basically be the same.	2020-12-11 17:06:11 +01:00
Willy Tarreau	2ded48dd27	MINOR: connection: make conn_sock_drain() use the control layer's ->drain() Now we don't touch the fd anymore there, instead we rely on the ->drain() provided by the control layer. As such the function was renamed to conn_ctrl_drain().	2020-12-11 16:26:01 +01:00
Willy Tarreau	586f71b43f	REORG: connection: move the socket iocb (conn_fd_handler) to sock.c conn_fd_handler() is 100% specific to socket code. It's about time it moves to sock.c which manipulates socket FDs. With it comes conn_fd_check() which tests for the socket's readiness. The ugly connection status check at the end of the iocb was moved to an inlined function in connection.h so that if we need it for other socket layers it's not too hard to reuse. The code was really only moved and not changed at all.	2020-12-11 16:26:00 +01:00
Willy Tarreau	827fee7406	MINOR: connection: remove sock-specific code from conn_sock_send() The send() loop present in this function and the error handling is already present in raw_sock_from_buf(). Let's rely on it instead and stop touching the FD from this place. The send flag was changed to use a more agnostic CO_SFL_*. The name was changed to "conn_ctrl_send()" to remind that it's meant to be used to send at the lowest level.	2020-12-11 16:25:11 +01:00
Willy Tarreau	8b250ba738	CLEANUP: connection: open-code conn_cond_update_polling() and update the comment This last call to conn_cond_update_polling() is now totally misleading as the function only stops polling in case of unrecoverable connection error. Let's open-code the test to make it more prominent and explain what we're trying to do there. It's even almost certain this code is never executed anymore, as the only remaining case should be a mux's wake function setting CO_FL_ERROR without disabling the polling, but they need to be audited first to make sure this is the case.	2020-12-11 11:19:24 +01:00
Willy Tarreau	5a1d439225	CLEANUP: connection: use fd_stop_both() instead of conn_stop_polling() conn_stop_polling() in fact only calls fd_stop_both() after checking that the ctrl layer is ready. It's the case in conn_fd_check() so let's get rid of this next-to-last user of this function.	2020-12-11 09:56:53 +01:00
Willy Tarreau	38b4d2eb22	CLEANUP: connection: do not use conn->owner when the session is known At a few places we used to rely on conn->owner to retrieve the session while the session is already known. This is not correct because at some of these points the reason the connection's owner was still the session (instead of NULL) is a mistake. At one place a comparison is even made between the session and conn->owner assuming it's valid without checking if it's NULL. Let's clean this up to use the session all the time. Note that this will be needed for a forthcoming fix and will have to be backported.	2020-11-21 15:29:22 +01:00
Willy Tarreau	9b7587a6af	MINOR: connection: make sockaddr_alloc() take the address to be copied Roughly half of the calls to sockadr_alloc() are made to copy an already known address. Let's optionally pass it in argument so that the function can handle the copy at the same time, this slightly simplifies its usage.	2020-10-15 21:47:56 +02:00
Willy Tarreau	e53e7ec9d9	CLEANUP: protocol: remove the ->drain() function No protocol defines it anymore. The last user used to be the monitor-net stuff that got partially broken already when the tcp_drain() function moved to conn_sock_drain() with commit e215bba95 ("MINOR: connection: make conn_sock_drain() work for all socket families") in 1.9-dev2. A part of this will surely move back later when non-socket connections arrive with QUIC but better keep the API clean and implement what's needed in time instead.	2020-10-15 21:47:04 +02:00
Ilya Shipitsin	6b79f38a7a	CLEANUP: assorted typo fixes in the code and comments This is 12th iteration of typo fixes	2020-07-31 11:18:07 +02:00
Christopher Faulet	08016ab82d	MEDIUM: connection: Add private connections synchronously in session server list When a connection is marked as private, it is now added in the session server list. We don't wait a stream is detached from the mux to do so. When the connection is created, this happens after the mux creation. Otherwise, it is performed when the connection is marked as private. To allow that, when a connection is created, the session is systematically set as the connectin owner. Thus, a backend connection has always a owner during its creation. And a private connection has always a owner until its death. Note that outside the detach() callback, if the call to session_add_conn() failed, the error is ignored. In this situation, we retry to add the connection into the session server list in the detach() callback. If this fails at this step, the multiplexer is destroyed and the connection is closed.	2020-07-15 14:08:14 +02:00
Christopher Faulet	2883fcf65b	BUG/MINOR: connection: See new connection as available only on reuse always When the multiplexer creation is delayed after the handshakes phase, the connection is added in the available connection list if http-reuse never is not configured for the backend. But it is a wrong statement. At this step, the connection is not safe because it is a new connection. So it must be added in the available connection list only if http-reuse always is used. No backport needed, this is 2.2-dev.	2020-07-07 14:31:01 +02:00
Christopher Faulet	aa27853ce2	BUG/MEDIUM: connection: Don't consider new private connections as available When a connection is created and the multiplexer is installed, if the connection is marked as private, don't consider it as available, regardless the number of available streams. This test is performed when the mux is installed when the connection is created, in connect_server(), and when the mux is installed after the handshakes stage. No backport needed, this is 2.2-dev.	2020-07-07 14:30:38 +02:00
Willy Tarreau	4d82bf5c2e	MINOR: connection: align toremove_{lock,connections} and cleanup into idle_conns We used to have 3 thread-based arrays for toremove_lock, idle_cleanup, and toremove_connections. The problem is that these items are small, and that this creates false sharing between threads since it's possible to pack up to 8-16 of these values into a single cache line. This can cause real damage where there is contention on the lock. This patch creates a new array of struct "idle_conns" that is aligned on a cache line and which contains all three members above. This way each thread has access to its variables without hindering the other ones. Just doing this increased the HTTP/1 request rate by 5% on a 16-thread machine. The definition was moved to connection.{c,h} since it appeared a more natural evolution of the ongoing changes given that there was already one of them declared in connection.h previously.	2020-06-28 10:52:36 +02:00
Willy Tarreau	4cabfc18a3	BUG/MAJOR: connection: always disable ready events once reported This effectively reverts the two following commits: 6f95f6e11 ("OPTIM: connection: disable receiving on disabled events when the run queue is too high") 065a02561 ("MEDIUM: connection: don't stop receiving events in the FD handler") The problem as reported in issue #662 is that when the events signals the readiness of input data that has to be forwarded over a congested stream, the mux will read data and wake the stream up to forward them, but the buffer full condition makes this impossible immediately, then nobody in the chain will be able to disable the event after it was first reported. And given we don't know at the connection level whether an event was already reported or not, we can't decide anymore to forcefully stop it if for any reason its processing gets delayed. The problem is magnified in issue #662 by the fact that a shutdown is reported with pending data occupying the buffer. The shutdown will strike in loops and cause the upper layer stream to be notified until it's handled, but with a buffer full it's not possible to call cs_recv() hence to purge the event. All this can only be handled optimally by implementing a lower layer, direct mux-to-mux forwarding that will not require any scheduling. This was no wake up will be needed and the event will be instantly handled or paused for a long time. For now let's simply revert these optimizations. Running a 1 MB transfer test over H2 using 8 connections having each 32 streams with a limited link of 320 Mbps shows the following profile before this fix: calls syscall (100% CPU) ------ ------- 259878 epoll_wait 519759 clock_gettime 17277 sendto 17129 recvfrom 672 epoll_ctl And the following one after the fix: calls syscall (2-3% CPU) ------ ------- 17201 sendto 17357 recvfrom 2304 epoll_wait 4609 clock_gettime 1200 epoll_ctl Thus the behavior is much better. No backport is needed as these patches were only in 2.2-dev. Many thanks to William Dauchy for reporting a lot of details around this difficult issue.	2020-06-17 17:00:51 +02:00
Willy Tarreau	b2551057af	CLEANUP: include: tree-wide alphabetical sort of include files This patch fixes all the leftovers from the include cleanup campaign. There were not that many (~400 entries in ~150 files) but it was definitely worth doing it as it revealed a few duplicates.	2020-06-11 10:18:59 +02:00
Willy Tarreau	36979d9ad5	REORG: include: move the error reporting functions to from log.h to errors.h Most of the files dealing with error reports have to include log.h in order to access ha_alert(), ha_warning() etc. But while these functions don't depend on anything, log.h depends on a lot of stuff because it deals with log-formats and samples. As a result it's impossible not to embark long dependencies when using ha_warning() or qfprintf(). This patch moves these low-level functions to errors.h, which already defines the error codes used at the same places. About half of the users of log.h could be adjusted, sometimes revealing other issues such as missing tools.h. Interestingly the total preprocessed size shrunk by 4%.	2020-06-11 10:18:59 +02:00
Willy Tarreau	6be7849f39	REORG: include: move cfgparse.h to haproxy/cfgparse.h There's no point splitting the file in two since only cfgparse uses the types defined there. A few call places were updated and cleaned up. All of them were in C files which register keywords. There is nothing left in common/ now so this directory must not be used anymore.	2020-06-11 10:18:58 +02:00
Willy Tarreau	5e539c9b8d	REORG: include: move stream_interface.h to haproxy/stream_interface{,-t}.h Almost no changes, removed stdlib and added buf-t and connection-t to the types to avoid a warning.	2020-06-11 10:18:58 +02:00
Willy Tarreau	209108dbbd	REORG: include: move ssl_sock.h to haproxy/ssl_sock{,-t}.h Almost nothing changed, just moved a static inline at the end and moved an export from the types to the main file.	2020-06-11 10:18:58 +02:00
Willy Tarreau	7ea393d95e	REORG: include: move connection.h to haproxy/connection{,-t}.h The type file is becoming a mess, half of it is for the proxy protocol, another good part describes conn_streams and mux ops, it would deserve being split again. At least it was reordered so that elements are easier to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved to compat.h as it's said to be the value for Linux.	2020-06-11 10:18:58 +02:00
Willy Tarreau	fc77454aff	REORG: include: move proto_tcp.h to haproxy/proto_tcp.h There was no type file. This one really is trivial. A few missing includes were added to satisfy the exported functions prototypes.	2020-06-11 10:18:58 +02:00
Willy Tarreau	e6ce10be85	REORG: include: move sample.h to haproxy/sample{,-t}.h This one is particularly tricky to move because everyone uses it and it depends on a lot of other types. For example it cannot include arg-t.h and must absolutely only rely on forward declarations to avoid dependency loops between vars -> sample_data -> arg. In order to address this one, it would be nice to split the sample_data part out of sample.h.	2020-06-11 10:18:58 +02:00
Willy Tarreau	762d7a5117	REORG: include: move frontend.h to haproxy/frontend.h There was no type file for this one, it only contains frontend_accept().	2020-06-11 10:18:57 +02:00
Willy Tarreau	0f6ffd652e	REORG: include: move fd.h to haproxy/fd{,-t}.h A few includes were missing in each file. A definition of struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was moved to defaults.h Stdio used to be silently inherited from whatever path but it's needed for list_pollers() which takes a FILE* and which can thus not be forward-declared.	2020-06-11 10:18:57 +02:00
Willy Tarreau	7a00efbe43	REORG: include: move common/namespace.h to haproxy/namespace{,-t}.h The type was moved out as it's used by standard.h for netns_entry. Instead of just being a forward declaration when not used, it's an empty struct, which makes gdb happier (the resulting stripped executable is the same).	2020-06-11 10:18:57 +02:00
Willy Tarreau	6131d6a731	REORG: include: move common/net_helper.h to haproxy/net_helper.h No change was necessary.	2020-06-11 10:18:57 +02:00
Willy Tarreau	58017eef3f	REORG: include: move the BUG_ON() code to haproxy/bug.h This one used to be stored into debug.h but the debug tools got larger and require a lot of other includes, which can't use BUG_ON() anymore because of this. It does not make sense and instead this macro should be placed into the lower includes and given its omnipresence, the best solution is to create a new bug.h with the few surrounding macros needed to trigger bugs and place assertions anywhere. Another benefit is that it won't be required to add include <debug.h> anymore to use BUG_ON, it will automatically be covered by api.h. No less than 32 occurrences were dropped. The FSM_PRINTF macro was dropped since not used at all anymore (probably since 1.6 or so).	2020-06-11 10:18:56 +02:00
Willy Tarreau	8d36697dee	REORG: include: move base64.h, errors.h and hash.h from common to to haproxy/ These ones do not depend on any other file. One used to include haproxy/api.h but that was solely for stddef.h.	2020-06-11 10:18:56 +02:00
Willy Tarreau	4c7e4b7738	REORG: include: update all files to use haproxy/api.h or api-t.h if needed All files that were including one of the following include files have been updated to only include haproxy/api.h or haproxy/api-t.h once instead: - common/config.h - common/compat.h - common/compiler.h - common/defaults.h - common/initcall.h - common/tools.h The choice is simple: if the file only requires type definitions, it includes api-t.h, otherwise it includes the full api.h. In addition, in these files, explicit includes for inttypes.h and limits.h were dropped since these are now covered by api.h and api-t.h. No other change was performed, given that this patch is large and affects 201 files. At least one (tools.h) was already freestanding and didn't get the new one added.	2020-06-11 10:18:42 +02:00
Christopher Faulet	3ab504f5ff	BUG/MEDIUM: connection: Ignore PP2 unique ID for stream-less connections It is possible to send a unique ID when the PROXY protocol v2 is used. It relies on the stream to do so. So we must be sure to have a stream. Locally initiated connections may not be linked to a stream. For instance, outgoing connections created by health checks have no stream. Moreover, the stream is not retrieved for mux-less connections (this bug will be fixed in another commit). Unfortunately, in make_proxy_line_v2() function, the stream is not tested before generating the unique-id. This bug leads to a segfault when a health check is performed for a server with the PROXY protocol v2 and the unique-id option enabled. It also crashes for servers using SSL connections with alpn. The bug was introduced by the commit cf6e0c8a8 ("MEDIUM: proxy_protocol: Support sending unique IDs using PPv2") This patch should fix the issue #640. It must be backported to the same versions as the commit above.	2020-05-26 17:36:01 +02:00
Willy Tarreau	119e50e0cc	MINOR: connection: add pp2-never-send-local to support old PP2 behavior A bug in the PROXY protocol v2 implementation was present in HAProxy up to version 2.1, causing it to emit a PROXY command instead of a LOCAL command for health checks. This is particularly minor but confuses some servers' logs. Sadly, the bug was discovered very late and revealed that some servers which possibly only tested their PROXY protocol implementation against HAProxy fail to properly handle the LOCAL command, and permanently remain in the "down" state when HAProxy checks them. When this happens, it is possible to enable this global option to revert to the older (bogus) behavior for the time it takes to contact the affected components' vendors and get them fixed. This option is disabled by default and acts on all servers having the "send-proxy-v2" statement. Older versions were reverted to the old behavior and should not attempt to be fixed by default again. However a variant of this patch could possibly be implemented to ask to explicitly send LOCAL if needed by some servers. More context here: https://www.mail-archive.com/haproxy@formilux.org/msg36890.html https://www.mail-archive.com/haproxy@formilux.org/msg37218.html	2020-05-22 13:55:32 +02:00
Christopher Faulet	14cd316a1f	MAJOR: checks: Use the best mux depending on the protocol for health checks When a tcp-check connect rule is evaluated, the mux protocol corresponding to the health-check is chosen. So for TCP based health-checks, the mux-pt is used. For HTTP based health-checks, the mux-h1 is used. The connection is marked as private to be sure to not ruse regular HTTP connection for health-checks. Connections reuse will be evaluated later. The functions evaluating HTTP send rules and expect rules have been updated to be HTX compliant. The main change for users is that HTTP health-checks are now stricter on the HTTP message format. While before, the HTTP formatting and parsing were minimalist, now messages should be well formatted.	2020-04-27 10:41:07 +02:00
Willy Tarreau	02c88036a6	BUG/MINOR: connection: always send address-less LOCAL PROXY connections Commit 7f26391bc5 ("BUG/MINOR: connection: make sure to correctly tag local PROXY connections") revealed that some implementations do not properly ignore addresses in LOCAL connections (at least Dovecot was spotted). More context information in the thread below: https://www.mail-archive.com/haproxy@formilux.org/msg36890.html The patch above was using LOCAL on top of local addresses in order to minimize the risk of breakage but revealed worse than a clean fix. So let's partially revert it and send pure LOCAL connections instead now. After a bit of observation, this patch should be progressively backported to stable branches. However if it reveals new breakage, the backport of the patch above will have to be reverted from stable branches while other products work on fixing their code based on the master branch.	2020-04-14 16:02:50 +02:00
Ilya Shipitsin	ce7b00f926	CLEANUP: assorted typo fixes in the code and comments This is fifth iteration of typo fixes	2020-03-31 17:09:35 +02:00
Olivier Houchard	f0d4dff25c	MINOR: connections: Make the "list" element a struct mt_list instead of list. Make the "list" element a struct mt_list, and explicitely use list_from_mt_list to get a struct list * where it is used as such, so that mt_list_for_each_entry will be usable with it.	2020-03-19 22:07:33 +01:00
Olivier Houchard	dc2f2753e9	MEDIUM: servers: Split the connections into idle, safe, and available. Revamp the server connection lists. We know have 3 lists : - idle_conns, which contains idling connections - safe_conns, which contains idling connections that are safe to use even for the first request - available_conns, which contains connections that are not idling, but can still accept new streams (those are HTTP/2 or fastcgi, and are always considered safe).	2020-03-19 22:07:33 +01:00
Tim Duesterhus	2b7f6c22d8	CLEANUP: connection: Stop directly setting an ist's .ptr Instead replace the complete `ist` by the value returned from `ist2`. This was noticed during review of issue #549.	2020-03-14 18:31:58 +01:00
Tim Duesterhus	a8692f3fe0	CLEANUP: connection: Add blank line after declarations in PP handling This adds the missing blank lines in `make_proxy_line_v2` and `conn_recv_proxy`. It also adjusts the type of the temporary variable used for the return value of `recv` to be `ssize_t` instead of `int`.	2020-03-13 17:26:43 +01:00
Tim Duesterhus	cf6e0c8a83	MEDIUM: proxy_protocol: Support sending unique IDs using PPv2 This patch adds the `unique-id` option to `proxy-v2-options`. If this option is set a unique ID will be generated based on the `unique-id-format` while sending the proxy protocol v2 header and stored as the unique id for the first stream of the connection. This feature is meant to be used in `tcp` mode. It works on HTTP mode, but might result in inconsistent unique IDs for the first request on a keep-alive connection, because the unique ID for the first stream is generated earlier than the others. Now that we can send unique IDs in `tcp` mode the `%ID` log variable is made available in TCP mode.	2020-03-13 17:26:43 +01:00
Tim Duesterhus	d1b15b6e9b	MINOR: proxy_protocol: Ingest PP2_TYPE_UNIQUE_ID on incoming connections This patch reads a proxy protocol v2 provided unique ID and makes it available using the `fc_pp_unique_id` fetch.	2020-03-13 17:25:23 +01:00
Tim Duesterhus	ba837ec367	CLEANUP: proxy_protocol: Use `size_t` when parsing TLVs Change `int` to `size_t` for consistency.	2020-03-06 11:16:19 +01:00
Tim Duesterhus	488ee7fb6e	BUG/MAJOR: proxy_protocol: Properly validate TLV lengths This patch fixes PROXYv2 parsing when the payload of the TCP connection is fused with the PROXYv2 header within a single recv() call. Previously HAProxy ignored the PROXYv2 header length when attempting to parse the TLV, possibly interpreting the first byte of the payload as a TLV type. This patch adds proper validation. It ensures that: 1. TLV parsing stops when the end of the PROXYv2 header is reached. 2. TLV lengths cannot exceed the PROXYv2 header length. 3. The PROXYv2 header ends together with the last TLV, not allowing for "stray bytes" to be ignored. A reg-test was added to ensure proper behavior. This patch tries to find the sweat spot between a small and easily backportable one, and a cleaner one that's more easily adaptable to older versions, hence why it merges the "if" and "while" blocks which causes a reindent of the whole block. It should be used as-is for versions 1.9 to 2.1, the block about PP2_TYPE_AUTHORITY should be dropped for 2.0 and the block about CRC32C should be dropped for 1.8. This bug was introduced when TLV parsing was added. This happened in commit b3e54fe387c7c1ea750f39d3029672d640c499f9. This commit was first released with HAProxy 1.6-dev1. A similar issue was fixed in commit 7209c204bd6f3c49132264c7a58f689cdc741c12. This patch must be backported to HAProxy 1.6+.	2020-03-06 11:11:22 +01:00
Willy Tarreau	6f95f6e111	OPTIM: connection: disable receiving on disabled events when the run queue is too high In order to save a lot on syscalls, we currently don't disable receiving on a file descriptor anymore if its handler was already woken up. But if the run queue is huge and the poller collects a lot of events, this causes excess wakeups which take CPU time which is not used to flush these tasklets. This patch simply considers the run queue size to decide whether or not to stop receiving. Tests show that by stopping receiving when the run queue reaches ~16 times its configured size, we can still hold maximal performance in extreme situations like maxpollevents=20k for runqueue_depth=2, and still totally avoid calling epoll_event under moderate load using default settings on keep-alive connections.	2020-03-04 19:29:12 +01:00
Willy Tarreau	8de5c4fa15	MEDIUM: connection: only call ->wake() for connect() without I/O We used to call ->wake() for any I/O event for which there was no subscriber. But this is a problem because this causes massive wake() storms since we disabled fd_stop_recv() to save syscalls. The only reason for the io_available condition is to detect that an asynchronous connect() just finished and will not be handled by any registered event handler. Since we now properly handle synchronous connects, we can detect this situation by the fact that we had a success on conn_fd_check() and no requested I/O took over.	2020-03-04 19:29:12 +01:00
Willy Tarreau	667fefdc90	BUG/MEDIUM: connection: stop polling for sending when the event is ready With commit 065a025610 ("MEDIUM: connection: don't stop receiving events in the FD handler") we disabled a number of fd_stop_* in conn_fd_handler(), in order to wait for their respective handlers to deal with them. But it is not correct to do that for the send direction, as we may very well have nothing to send. This is visible when connecting in TCP mode to a server with no data to send, there's nobody anymore to disable the polling for the send direction. And it is logical, on the recv() path we know the system has data to deliver and that some code will be in charge of it. On the send direction we simply don't know if it was the result of a successful connect() or if there is still something to send. In any case we almost never fill the network buffer on a single send() after being woken up by the system, so disabling the FD immediately or much later will not change the number of operations. No backport is needed, this is 2.2-dev.	2020-03-04 19:29:12 +01:00
Willy Tarreau	065a025610	MEDIUM: connection: don't stop receiving events in the FD handler The remaining epoll_ctl() calls are exclusively caused by the disagreement between conn_fd_handler() and the mux receiving the data: the fd handler wants to stop after having woken up the tasklet, then the mux after receiving data wants to receive again. Given that they don't happen in the same poll loop when there are many FDs, this causes a lot of state changes. As suggested by Olivier, if the task is already scheduled for running, we don't need to disable the event because it's in the run queue, poll() cannot stop, and reporting it again will be harmless. What might happen however is that a sampling-based poller like epoll() would report many times the same event and has trouble getting others behind. But if it would happen, it would still indicate the run queue has plenty of pending operations, so it would in fact only displace the problem from the poller to the run queue, which doesn't seem to be worse (and in fact we do support priorities while the poller does not). By doing this change, the keep-alive test with 1k conns and 100k reqs completely gets rid of the per-request epoll_ctl changes, while still not causing extra recvfrom() : $ ./h1load -n 100000 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20 200000 sendto 1 200000 recvfrom 1 10762 epoll_wait 1 3664 epoll_ctl 1 1999 recvfrom -1 In close mode, it didn't change anything, we're still in the optimal case (2 epoll per connection) : $ ./h1load -n 100000 -r 1 -t 4 -c 1000 -T 20 -F 127.0.0.1:8001/?s=1k/t=20 203764 epoll_ctl 1 200000 sendto 1 200000 recvfrom 1 6091 epoll_wait 1 2994 recvfrom -1	2020-02-28 16:17:09 +01:00

... 2 3 4 5 6 ...

384 Commits