haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-10-13 00:21:20 +02:00

Author	SHA1	Message	Date
Willy Tarreau	50b659476c	BUG/MEDIUM: listener: only consider running threads when resuming listeners In bug #495 we found that it is possible to resume a listener on an inexistent thread. This happens when a bind's thread_mask contains bits out of the active threads mask, such as when using "1/odd" or "1/even". The thread_mask was used as-is to pick a thread number to re-enable the listener, and given that the highest number is used, 1/odd or 1/even can produce quite high thread numbers and crash the process by queuing some entries into non-existent lists. This bug is an incomplete fix of commit 413e926ba ("BUG/MAJOR: listener: fix thread safety in resume_listener()") though it will only trigger if some bind lines are explicitly bound to thread numbers higher than the thread count. The fix must be backported to all branches having the fix above (as far as 1.8, though the code is different there, see the commit message in 1.8 for changes). There are a few other places where bind_thread is used without enforcing all_thread_mask, namely when doing fd_insert() while creating listeners. It seems harmless but would probably deserve another fix.	2020-02-12 10:21:33 +01:00
Willy Tarreau	eeea8082a8	BUG/MAJOR: listener: do not schedule a task-less proxy Apparently seamingless commit 0591bf7deb ("MINOR: listener: make the wait paths cleaner and more reliable") caused a nasty regression and revealed a rare race that hits regtest stickiness/lb-services.vtc about 4% of the times for 8 threads. The problem is that when a multi-threaded listener wakes up on an incoming connection, several threads can receive the event, especially when idle. And all of them will race to accept the connections in parallel, adjusting the listener's nbconn and proxy's feconn until one reaches the proxy's limit and declines. At this step the changes are cancelled, the listener is marked "limited", and when the threads exit the function, one of them will unlimit the listener/proxy again so that it can accept incoming connections again. The problem happens when many threads connect to a small peers section because its maxconn is very limited (typically 6 for 2 peers), and it's sometimes possible for enough competing threads to hit the limit and one of them will limit the listener and queue the proxy's task... except that peers do not initialize their proxy task since they do not use rate limiting. Thus the process crashes when doing task_schedule(p->task). Prior to the cleanup patch above, this didn't happen because the error path that was dedicated to only limiting the listener did not call task_schedule(p->task). Given that the proxy's task is optional, and that the expire value passed there is always TICK_ETERNITY, it's sufficient and reasonable to avoid calling this task_schedule() when expire is not set. And for long term safety we can also avoid to do it when the task is not set. A first fix consisted in allocating a task for the peers proxies but it's never used and would eat resources for reason. No backport is needed as this commit was only merged into 2.2.	2020-01-08 19:39:09 +01:00
Willy Tarreau	cdcba115b8	BUG/MINOR: listener: do not immediately resume on transient error The listener supports a "transient error" situation, which corresponds to those situations where accept fails badly but poll() reports an event. This happens for example when a listener is paused, or on out of FD. The same mechanism is used when facing a maxconn or maxsessrate limitation. When this happens, the listener is disabled for up to 100ms and put back into the global listener queue so that it automatically wakes up again as soon as the conditions change from an existing connection releasing one resource, or the system recovers from a transient issue. The listener_accept() function has a bug in its exit path causing a freshly limited listener to be immediately enabled again because all the conditions are met (connection count < max). It doesn't take into account the fact that the listener might have been queued and must first wait for the timeout to expire before doing so. The impact is that upon certain errors, the faulty process will busy loop on the accept code without sleeping. This is the scenario reported and diagnosed by @hedong0411 in issue #382. This commit fixes it by verifying that the global queue's delay is at least expired before deciding to resume the listener. Another approach could consist in having an extra state like LI_DELAY for situations where only a delay is acceptable, but this would probably not bring anything except more complex code. This issue was introduced with the lock-free listener accept code (commits 3f0d02b and 82c9789a) that were backported to 1.8.20+ and 1.9.7+, so this fix must be backported to the relevant branches.	2019-12-11 15:06:30 +01:00
Willy Tarreau	a1d97f88e0	REORG: listener: move the global listener queue code to listener.c The global listener queue code and declarations were still lying in haproxy.c while not needed there anymore at all. This complicates the code for no reason. As a result, the global_listener_queue_task and the global_listener_queue were made static.	2019-12-10 14:16:03 +01:00
Willy Tarreau	241797a3fc	MINOR: listener: split dequeue_all_listener() in two We use it half times for the global_listener_queue and half times for a proxy's queue and this requires the callers to take care of these. Let's split it in two versions, the current one working only on the global queue and another one dedicated to proxies for the per-proxy queues. This cleans up quite a bit of code.	2019-12-10 14:14:09 +01:00
Willy Tarreau	0591bf7deb	MINOR: listener: make the wait paths cleaner and more reliable In listener_accept() there are several situations where we have to wait for an event or a delay. These ones all implement their own call to limit_listener() and the associated task_schedule(). In addition to being ugly and confusing, one expire date computation is even wrong as it doesn't take in account the fact that we're using threads and that the value might change in the middle. Fortunately task_schedule() gets it right for us. This patch creates two jump locations, one for the global queue and one for the proxy queue, allowing the rest of the code to only compute the expire delay and jump to the right location.	2019-12-10 12:04:27 +01:00
Willy Tarreau	92079934a9	BUG/MEDIUM: listener/threads: fix a remaining race in the listener's accept() Recent fix 4c044e274c ("BUG/MEDIUM: listener/thread: fix a race when pausing a listener") is insufficient and moves the race slightly farther. What now happens is that if we're limiting a listener due to a transient error such as an accept() error for example, or because the proxy's maxconn was reached, another thread might in the mean time have switched again to LI_READY and at the end of the function we'll disable polling on this FD, resulting in a listener that never accepts anything anymore. It can more easily happen when sending SIGTTOU/SIGTTIN to temporarily pause the listeners to let another process bind next to them. What this patch does instead is to move all enable/disable operations at the end of the function and condition them to the state. The listener's state is checked under the lock and the FD's polling state adjusted accordingly so that the listener's state and the FD always remain 100% synchronized. It was verified with 16 threads that the cost of taking that lock is not measurable so that's fine. This should be backported to the same branches the patch above is backported to.	2019-12-10 10:43:31 +01:00
Willy Tarreau	20aeb1c7cd	BUG/MINOR: listener: also clear the error flag on a paused listener When accept() fails because a listener is temporarily paused, the FD might have both FD_POLL_HUP and FD_POLL_ERR bits set. While we do not exploit FD_POLL_ERR here it's better to clear it because it is reported on "show fd" and is confusing. This may be backported to all versions.	2019-12-10 10:43:31 +01:00
Willy Tarreau	7cdeb61701	BUG/MINOR: listener/threads: always use atomic ops to clear the FD events There was a leftover of the single-threaded era when removing the FD_POLL_HUP flag from the listeners. By not using an atomic operation to clear the flag, another thread acting on the same listener might have lost some events, though this would have resulted in that thread to reprocess them immediately on the next loop pass. This should be backported as far as 1.8.	2019-12-10 10:43:31 +01:00
Willy Tarreau	4c044e274c	BUG/MEDIUM: listener/thread: fix a race when pausing a listener There exists a race in the listener code where a thread might disable receipt on a listener's FD then turn it to LI_PAUSED while at the same time another one faces EAGAIN on accept() and enables it again via fd_cant_recv(). The result is that the FD is in LI_PAUSED state with its polling still enabled. listener_accept() does not do anything then and doesn't disable the FD either, resulting in a thread eating all the CPU as reported in issue #358. A solution would be to take the listener's lock to perform the fd_cant_recv() call and do it only if the FD is still in LI_READY state, but this would be totally overkill while in practice the issue only happens during shutdown. Instead what is done here is that when leaving we recheck the state and disable polling if the listener is not in LI_READY state, which never happens except when being limited. In the worst case there could be one extra check per thread for the time required to converge, which is absolutely nothing. This fix was successfully tested, and should be backported to all versions using the lock-free listeners, which means all those containing commit 3f0d02bb ("MAJOR: listener: do not hold the listener lock in listener_accept()"), hence 2.1, 2.0, 1.9.7+, 1.8.20+.	2019-12-05 07:40:32 +01:00
Willy Tarreau	93604edb65	BUG/MEDIUM: listeners: always pause a listener on out-of-resource condition A corner case was opened in the listener_accept() code by commit 3f0d02bbc2 ("MAJOR: listener: do not hold the listener lock in listener_accept()"). The issue is when one listener (or a group of) managed to eat all the proxy's or all the process's maxconn, and another listener tries to accept a new socket. This results in the atomic increment to detect the excess connection count and immediately abort, without pausing the listener, thus the call is immediately performed again. This doesn't happen when the test is run on a single listener because this listener got limited when crossing the limit. But with 2 or more listeners, we don't have this luxury. The solution consists in limiting the listener as soon as we have to decline accepting an incoming connection. This means that the listener will not be marked full yet if it gets the exact connection count but this is not a problem in practice since all other listeners will only be marked full after their first attempt. Thus from now on, a listener is only full once it has already failed taking an incoming connection. This bug was definitely responsible for the unreproduceable occasional reports of high CPU usage showing epoll_wait() returning immediately without accepting an incoming connection, like in bug #129. This fix must be backported to 1.9 and 1.8.	2019-11-15 10:34:51 +01:00
Willy Tarreau	2bd65a781e	OPTIM: listeners: use tasklets for the multi-queue rings Now that we can wake up a remote thread's tasklet, it's way more interesting to use a tasklet than a task in the accept queue, as it will avoid passing through all the scheduler. Just doing this increases the accept rate by about 4%, overall recovering the slight loss introduced by the tasklet change. In addition it makes sure that even a heavily loaded scheduler (e.g. many very fast checks) will not delay a connection accept.	2019-09-24 06:57:32 +02:00
Olivier Houchard	859dc80f94	MEDIUM: list: Separate "locked" list from regular list. Instead of using the same type for regular linked lists and "autolocked" linked lists, use a separate type, "struct mt_list", for the autolocked one, and introduce a set of macros, similar to the LIST_* macros, with the MT_ prefix. When we use the same entry for both regular list and autolocked list, as is done for the "list" field in struct connection, we know have to explicitely cast it to struct mt_list when using MT_ macros.	2019-09-23 18:16:08 +02:00
Christopher Faulet	ad6c2eac28	BUG/MINOR: listener: Fix a possible null pointer dereference It seems to be possible to have no frontend for a listener. A test was missing before dereferencing it at the end of the function listener_accept(). This patch fixes the issue #264. It must be backported to 2.0 and 1.9.	2019-09-10 10:29:54 +02:00
Willy Tarreau	6ee9f8df3b	BUG/MEDIUM: listener/threads: fix an AB/BA locking issue in delete_listener() The delete_listener() function takes the listener's lock before taking the proto_lock, which is contrary to what other functions do, possibly causing an AB/BA deadlock. In practice the two only places where both are taken are during protocol_enable_all() and delete_listener(), the former being used during startup and the latter during stop. In practice during reload floods, it is technically possible for a thread to be initializing the listeners while another one is stopping. While this is too hard to trigger on 2.0 and above due to the synchronization of all threads during startup, it's reasonably easy to do in 1.9 by having hundreds of listeners, starting 64 threads and flooding them with reloads like this : $ while usleep 50000; do killall -USR2 haproxy; done Usually in less than a minute, all threads will be deadlocked. The fix consists in always taking the proto_lock before the listener lock. It seems to be the only place where these two locks were reversed. This fix needs to be backported to 2.0, 1.9, and 1.8.	2019-08-26 11:07:09 +02:00
Willy Tarreau	daacf36645	BUG/MEDIUM: protocols: add a global lock for the init/deinit stuff Dragan Dosen found that the listeners lock is not sufficient to protect the listeners list when proxies are stopping because the listeners are also unlinked from the protocol list, and under certain situations like bombing with soft-stop signals or shutting down many frontends in parallel from multiple CLI connections, it could be possible to provoke multiple instances of delete_listener() to be called in parallel for different listeners, thus corrupting the protocol lists. Such operations are pretty rare, they are performed once per proxy upon startup and once per proxy on shut down. Thus there is no point trying to optimize anything and we can use a global lock to protect the protocol lists during these manipulations. This fix (or a variant) will have to be backported as far as 1.8.	2019-07-24 16:45:02 +02:00
Willy Tarreau	f2cb169487	BUG/MAJOR: listener: fix thread safety in resume_listener() resume_listener() can be called from a thread not part of the listener's mask after a curr_conn has gone lower than a proxy's or the process' limit. This results in fd_may_recv() being called unlocked if the listener is bound to only one thread, and quickly locks up. This patch solves this by creating a per-thread work_list dedicated to listeners, and modifying resume_listener() so that it bounces the listener to one of its owning thread's work_list and waking it up. This thread will then call resume_listener() again and will perform the operation on the file descriptor itself. It is important to do it this way so that the listener's state cannot be modified while the listener is being moved, otherwise multiple threads can take conflicting decisions and the listener could be put back into the global queue if the listener was used at the same time. It seems like a slightly simpler approach would be possible if the locked list API would provide the ability to return a locked element. In this case the listener would be immediately requeued in dequeue_all_listeners() without having to go through resume_listener() with its associated lock. This fix must be backported to all versions having the lock-less accept loop, which is as far as 1.8 since deadlock fixes involving this feature had to be backported there. It is expected that the code should not differ too much there. However, previous commit "MINOR: task: introduce work lists" will be needed as well and should not present difficulties either. For 1.8, the commits introducing thread_mask() and LIST_ADDED() will be needed as well, either backporting my_flsl() or switching to my_ffsl() will be OK, and some changes will have to be performed so that the init function is properly called (and maybe the deinit one can be dropped). In order to test for the fix, simply set up a multi-threaded frontend with multiple bind lines each attached to a single thread (reproduced with 16 threads here), set up a very low maxconn value on the frontend, and inject heavy traffic on all listeners in parallel with slightly more connections than the configured limit ( typically +20%) so that it flips very frequently. If the bug is still there, at some point (5-20 seconds) the traffic will go much lower or even stop, either with spinning threads or not.	2019-07-12 09:07:48 +02:00
Christopher Faulet	102854cbba	BUG/MEDIUM: listener: Fix how unlimited number of consecutive accepts is handled There is a bug when global.tune.maxaccept is set to -1 (no limit). It is pretty visible with one process (nbproc sets to 1). The functions listener_accept() and accept_queue_process() don't expect to handle negative maxaccept values. So instead of accepting incoming connections without any limit, none are never accepted and HAProxy loop infinitly in the scheduler. When there are 2 or more processes, the bug is a bit more subtile. The limit for a listener is set to 1. So only one connection is accepted at a time by a given listener. This happens because the listener's maxaccept value is an unsigned integer. In check_config_validity(), it is first set to UINT_MAX (-1 casted in an unsigned integer), and then some calculations on it leads to an integer overflow. To fix the bug, the listener's maxaccept value is now a signed integer. So, if a negative value is set for global.tune.maxaccept, we keep it untouched for the listener and no calculation is made on it. Then, in the listener code, this signed value is casted to a unsigned one. It simplifies all tests instead of dealing with negative values. So, it limits the number of connections accepted at a time to UINT_MAX at most. But, honestly, it not an issue. This patch must be backported to 1.9 and 1.8.	2019-04-30 15:28:29 +02:00
Willy Tarreau	85d0424b20	BUG/MINOR: listener/mq: correctly scan all bound threads under low load When iterating on the CLI using "show activity" and no other load, it was visible that the last thread was always skipped. This was caused by the way the thread bits were walking : t1 was updated after t2 to make sure it never equals t2 (thus it skips t2), and in case of a tie we choose t1. This results in the chosen thread never to equal t2 unless the other ones already have one connection. In addition to this, t2 was recalulated upon each pass due to the fact that only the 31th bit was looked at instead of looking at the t2'th bit. This patch fixes this by updating t2 after t1 so that t1 is free to walk over all positions under equal load. No measurable performance gains are expected from this though, but it at least removes one strange indicator which could lead to some suspicion. No backport is needed.	2019-04-16 18:09:13 +02:00
Willy Tarreau	64a9c05f37	MINOR: cli/listener: report the number of accepts on "show activity" The "show activity" command reports the number of incoming connections dispatched per thread but doesn't report the number of connections received by each thread. It is important to be able to monitor this value as it can show that for whatever reason a smaller set of threads is receiving the connections and dispatching them to all other ones.	2019-04-12 15:54:15 +02:00
Willy Tarreau	0d858446b6	BUG/MINOR: listener: renice the accept ring processing task It is not acceptable that the accept queues are handled with a normal priority since they are supposed to quickly dispatch the incoming traffic, resulting in tasks which will have their respective nice values and place in the queue. Let's renice the accept ring tasks to -1024. No backport is needed, this is strictly 2.0.	2019-04-12 15:54:03 +02:00
David Carlier	5671662f08	BUILD/MINOR: listener: Silent a few signedness warnings. Silenting couple of warnings related to signedness, due to a mismatch of signed and unsigned ints with l->nbconn, actconn and p->feconn.	2019-03-27 17:37:44 +01:00
Willy Tarreau	57cb506df8	BUILD: listener: shut up a build warning when threads are disabled We get this with __decl_hathreads due to the lone semi-colon, let's move it at the end of the innermost declaration : src/listener.c: In function 'listener_accept': src/listener.c:601:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]	2019-03-15 17:17:33 +01:00
Willy Tarreau	897e2c58e6	BUG/MEDIUM: listener: make sure we don't pick stopped threads Dragan Dosen reported that after the multi-queue changes, appending "process 1/even" on a bind line can make the process immediately crash when delivering a first connection. This is due to the fact that I believed that thread_mask(mask) applied the all_threads_mask value, but it doesn't. And in case of even/odd the bits cover more than the available threads, resulting in too high a thread number being selected and a non-existing task to be woken up. No backport is needed.	2019-03-13 15:03:53 +01:00
Olivier Houchard	64213e910d	MEDIUM: listeners: Use the new _HA_ATOMIC_* macros. Use the new _HA_ATOMIC_* macros and add barriers where needed.	2019-03-11 17:02:38 +01:00
Olivier Houchard	a51885621d	BUG/MEDIUM: listeners: Don't call fd_stop_recv() if fd_updt is NULL. In do_unbind_listener, don't bother calling fd_stop_recv() if fd_updt is NULL. It means it has already been free'd, and it would crash.	2019-03-08 16:05:31 +01:00
Willy Tarreau	0cf33176bd	MINOR: listener: move thr_idx from the bind_conf to the listener Tests show that it's slightly faster to have this field in the listener. The cache walk patterns are under heavy stress and having only this field written to in the bind_conf was wasting a cache line that was heavily read. Let's move this close to the other entries already written to in the listener. Warning, the position does have an impact on peak performance.	2019-03-07 14:08:26 +01:00
Willy Tarreau	9f1d4e7f7f	CLEANUP: listener: remove old thread bit mapping Now that the P2C algorithm for the accept queue is removed, we don't need to map a number to a thread bit anymore, so let's remove all these fields which are taking quite some space for no reason.	2019-03-07 13:59:04 +01:00
Willy Tarreau	0fe703bd50	MEDIUM: listener: change the LB algorithm again to use two round robins instead At this point, the random used in the hybrid queue distribution algorithm provides little benefit over a periodic scan, can even have a slightly worse worst case, and it requires to establish a mapping between a discrete number and a thread ID among a mask. This patch introduces a different approach using two indexes. One scans the thread mask from the left, the other one from the right. The related threads' loads are compared, and the least loaded one receives the new connection. Then one index is adjusted depending on the load resulting from this election, so that we start the next election from two known lightly loaded threads. This approach provides an extra 1% peak performance boost over the previous one, which likely corresponds to the removal of the extra work on the random and the previously required two mappings of index to thread. A test was attempted with two indexes going in the same direction but it was much less interesting because the same thread pairs were compared most of the time with the load climbing in a ladder-like model. With the reverse directions this cannot happen.	2019-03-07 13:57:33 +01:00
Willy Tarreau	fc630bd373	MINOR: listener: improve incoming traffic distribution By picking two randoms following the P2C algorithm, we seldom observe asymmetric loads on bursts of small session counts. This is typically what makes h2load take a bit of time to complete the last 100% because if a thread gets two connections while the other ones only have one, it takes twice the time to complete its work. This patch proposes a modification of the p2c algorithm which seems more suitable to this case : it mixes a rotating index with a random. This way, we're certain that all threads are consulted in turn and at the same time we're not forced to use the ones we're giving a chance. This significantly increases the traffic rate. Now h2load shows faster completion and the average request rates on H2 and the TLS resume rate increases by a bit more than 5% compared to pure p2c. The index was placed into the struct bind_conf because 1) it's faster there and it's the best place to optimally distribute traffic among a group of listeners. It's the only runtime-modified element there and it will be quite cache-hot.	2019-03-07 13:48:04 +01:00
Willy Tarreau	a8cf66bcab	MINOR: listener: do not needlessly set l->maxconn It's pointless to always set and maintain l->maxconn because the accept loop already enforces the frontend's limit anyway. Thus let's stop setting this value by default and keep it to zero meaning "no limit". This way the frontend's maxconn will be used by default. Of course if a value is set, it will be enforced.	2019-02-28 17:05:32 +01:00
Willy Tarreau	e2711c7bd6	MINOR: listener: introduce listener_backlog() to report the backlog value In an attempt to try to provide automatic maxconn settings, we need to decorrelate a listner's backlog and maxconn so that these values can be independent. This introduces a listener_backlog() function which retrieves the backlog value from the listener's backlog, the frontend's, the listener's maxconn, the frontend's or falls back to 1024. This corresponds to what was done in cfgparse.c to force a value there except the last fallback which was not set since the frontend's maxconn is always known.	2019-02-28 17:05:29 +01:00
Willy Tarreau	82c9789ac4	BUG/MEDIUM: listener: make sure the listener never accepts too many conns We were not checking p->feconn nor the global actconn soon enough. In older versions this could result in a frontend accepting more connections than allowed by its maxconn or the global maxconn, exactly N-1 extra connections where N is the number of threads, provided each of these threads were running a different listener. But with the lock removal, it became worse, the excess could be the listener's maxconn multiplied by the number of threads. Among the nasty side effect was that LI_FULL could be removed while the limit was still over and in some cases the polling on the socket was no re-enabled. This commit takes care of updating and checking p->feconn and the global actconn before processing the connection, so that the listener can be turned off before accepting the socket if needed. This requires to move some of the bookkeeping operations form session to listen, which totally makes sense in this context. Now the limits are properly respected, even if a listener's maxconn is over a frontend's. This only applies on top of the listener lock removal series and doesn't have to be backported.	2019-02-28 16:08:54 +01:00
Willy Tarreau	01abd02508	BUG/MEDIUM: listener: use a self-locked list for the dequeue lists There is a very difficult to reproduce race in the listener's accept code, which is much easier to reproduce once connection limits are properly enforced. It's an ABBA lock issue : - the following functions take l->lock then lq_lock : disable_listener, pause_listener, listener_full, limit_listener, do_unbind_listener - the following ones take lq_lock then l->lock : resume_listener, dequeue_all_listener This is because __resume_listener() only takes the listener's lock and expects to be called with lq_lock held. The problem can easily happen when listener_full() and limit_listener() are called a lot while in parallel another thread releases sessions for the same listener using listener_release() which in turn calls resume_listener(). This scenario is more prevalent in 2.0-dev since the removal of the accept lock in listener_accept(). However in 1.9 and before, a different but extremely unlikely scenario can happen : thread1 thread2 ............................ enter listener_accept() limit_listener() ............................ long pause before taking the lock session_free() dequeue_all_listeners() lock(lq_lock) [1] ............................ try_lock(l->lock) [2] __resume_listener() spin_lock(l->lock) =>WAIT[2] ............................ accept() l->accept() nbconn==maxconn => listener_full() state==LI_LIMITED => lock(lq_lock) =>DEADLOCK[1]! In practice it is almost impossible to trigger it because it requires to limit both on the listener's maxconn and the frontend's rate limit, at the same time, and to release the listener when the connection rate goes below the limit between poll() returns the FD and the lock is taken (a few nanoseconds). But maybe with threads competing on the same core it has more chances to appear. This patch removes the lq_lock and replaces it with a lockless queue for the listener's wait queue (well, technically speaking a self-locked queue) brought by commit a8434ec14 ("MINOR: lists: Implement locked variations.") and its few subsequent fixes. This relieves us from the need of the lq_lock and removes the deadlock. It also gets rid of the distinction between __resume_listener() and resume_listener() since the only difference was the lq_lock. All listener removals from the list are now unconditional to avoid races on the state. It's worth noting that the list used to never be initialized and that it used to work only thanks to the state tests, so the initialization has now been added. This patch must carefully be backported to 1.9 and very likely 1.8. It is mandatory to be careful about replacing all manipulations of l->wait_queue, global.listener_queue and p->listener_queue.	2019-02-28 16:08:54 +01:00
Willy Tarreau	7ac908bf8c	MINOR: config: add global tune.listener.multi-queue setting tune.listener.multi-queue { on \| off } Enables ('on') or disables ('off') the listener's multi-queue accept which spreads the incoming traffic to all threads a "bind" line is allowed to run on instead of taking them for itself. This provides a smoother traffic distribution and scales much better, especially in environments where threads may be unevenly loaded due to external activity (network interrupts colliding with one thread for example). This option is enabled by default, but it may be forcefully disabled for troubleshooting or for situations where it is estimated that the operating system already provides a good enough distribution and connections are extremely short-lived.	2019-02-27 14:27:07 +01:00
Willy Tarreau	8a03408d81	MINOR: activity: add accept queue counters for pushed and overflows It's important to monitor the accept queues to know if some incoming connections had to be handled by their originating thread due to an overflow. It's also important to be able to confirm thread fairness. This patch adds "accq_pushed" to activity reporting, which reports the number of connections that were successfully pushed into each thread's queue, and "accq_full", which indicates the number of connections that couldn't be pushed because the thread's queue was full.	2019-02-27 14:27:07 +01:00
Willy Tarreau	e0e9c48ab2	MAJOR: listener: use the multi-queue for multi-thread listeners The idea is to redistribute an incoming connection to one of the threads a bind_conf is bound to when there is more than one. We do this using a random improved by the p2c algorithm : a random() call returns two different thread numbers. We then compare their respective connection count and the length of their accept queues, and pick the least loaded one. We even use this deferred accept mechanism if the target thread ends up being the local thread, because this maintains fairness between all connections and tests show that it's about 1% faster this way, likely due to cache locality. If the target thread's accept queue is full, the connection is accepted synchronously by the current thread.	2019-02-27 14:27:07 +01:00
Willy Tarreau	1efafce61f	MINOR: listener: implement multi-queue accept for threads There is one point where we can migrate a connection to another thread without taking risk, it's when we accept it : the new FD is not yet in the fd cache and no task was created yet. It's still possible to assign it a different thread than the one which accepted the connection. The only requirement for this is to have one accept queue per thread and their respective processing tasks that have to be woken up each time an entry is added to the queue. This is a multiple-producer, single-consumer model. Entries are added at the queue's tail and the processing task is woken up. The consumer picks entries at the head and processes them in order. The accept queue contains the fd, the source address, and the listener. Each entry of the accept queue was rounded up to 64 bytes (one cache line) to avoid cache aliasing because tests have shown that otherwise performance suffers a lot (5%). A test has shown that it's important to have at least 256 entries for the rings, as at 128 it's still possible to fill them often at high loads on small thread counts. The processing task does almost nothing except calling the listener's accept() function and updating the global session and SSL rate counters just like listener_accept() does on synchronous calls. At this point the accept queue is implemented but not used.	2019-02-27 14:27:07 +01:00
Willy Tarreau	b2b50a7784	MINOR: listener: pre-compute some thread counts per bind_conf In order to quickly pick a thread ID when accepting a connection, we'll need to know certain pre-computed values derived from the thread mask, which are counts of bits per position multiples of 1, 2, 4, 8, 16 and 32. In practice it is sufficient to compute only the 4 first ones and store them in the bind_conf. We update the count every time the bind_thread value is adjusted. The fields in the bind_conf struct have been moved around a little bit to make it easier to group all thread bit values into the same cache line. The function used to return a thread number is bind_map_thread_id(), and it maps a number between 0 and 31/63 to a thread ID between 0 and 31/63, starting from the left.	2019-02-27 14:27:07 +01:00
Willy Tarreau	9e85318417	MINOR: listener: maintain a per-thread count of the number of connections on a listener Having this information will help us improve thread-level distribution of incoming traffic.	2019-02-27 14:27:07 +01:00
Willy Tarreau	3f0d02bbc2	MAJOR: listener: do not hold the listener lock in listener_accept() This function used to hold the listener's lock as a way to stay safe against concurrent manipulations, but it turns out this is wrong. First, the lock is held during l->accept(), which itself might indirectly call listener_release(), which, if the listener is marked full, could result in __resume_listener() to be called and the lock being taken twice. In practice it doesn't happen right now because the listener's FULL state cannot change while we're doing this. Second, all the code does is now protected against concurrent accesses. It used not to be the case in the early days of threads : the frequency counters are thread-safe. The rate limiting doesn't require extreme precision. Only the nbconn check is not thread safe. Third, the parts called here will have to be called from different threads without holding this lock, and this becomes a bigger issue if we need to keep this one. This patch does 3 things which need to be addressed at once : 1) it moves the lock to the only 2 functions that were not protected since called form listener_accept() : - limit_listener() - listener_full() 2) it makes sure delete_listener() properly checks its state within the lock. 3) it updates the l->nbconn tracking to make sure that it is always properly reported and accounted for. There is a point of particular care around the situation where the listener's maxconn is reached because the listener has to be marked full before accepting the connection, then resumed if the connection finally gets dropped. It is not possible to perform this change without removing the lock due to the deadlock issue explained above. This patch almost doubles the accept rate in multi-thread on a shared port between 8 threads, and multiplies by 4 the connection rate on a tcp-request connection reject rule.	2019-02-27 14:27:07 +01:00
Willy Tarreau	a36b324777	MEDIUM: listener: keep a single thread-mask and warn on "process" misuse Now that nbproc and nbthread are exclusive, we can still provide more detailed explanations about what we've found in the config when a bind line appears on multiple threads and processes at the same time, then ignore the setting. This patch reduces the listener's thread mask to a single mask instead of an array of masks per process. Now we have only one thread mask and one process mask per bind-conf. This removes ~504 bytes of RAM per bind-conf and will simplify handling of thread masks. If a "bind" line only refers to process numbers not found by its parent frontend or not covered by the global nbproc directive, or to a thread not covered by the global nbthread directive, a warning is emitted saying what will be used instead.	2019-02-27 14:27:07 +01:00
Willy Tarreau	741b4d6b7a	BUG/MINOR: listener: keep accept rate counters accurate under saturation The test on l->nbconn forces to exit the loop before updating the freq counters, so the last session which reaches a listener's limit will not be accounted for in the session rate measurement. Let's move the test at the beginning of the loop and mark the listener as saturated on exit. This may be backported to 1.9 and 1.8.	2019-02-27 08:03:41 +01:00
Olivier Houchard	d16a9dfed8	BUG/MAJOR: listener: Make sure the listener exist before using it. In listener_accept(), make sure we have a listener before attempting to use it. An another thread may have closed the FD meanwhile, and set fdtab[fd].owner to NULL. As the listener is not free'd, it is ok to attempt to accept() a new connection even if the listener was closed. At worst the fd has been reassigned to another connection, and accept() will fail anyway. Many thanks to Richard Russo for reporting the problem, and suggesting the fix. This should be backported to 1.9 and 1.8.	2019-02-25 16:30:13 +01:00
Willy Tarreau	ff9c9140f4	MINOR: config: make MAX_PROCS configurable at build time For some embedded systems, it's pointless to have 32- or even 64- large arrays of processes when it's known that much fewer processes will be used in the worst case. Let's introduce this MAX_PROCS define which contains the highest number of processes allowed to run at once. It still defaults to LONGBITS but may be lowered.	2019-02-07 15:10:19 +01:00
Willy Tarreau	6daac19b3f	MINOR: config: simplify bind_proc processing using proc_mask() At a number of places we used to have null tests on bind_proc for listeners and proxies. Let's simplify all these tests by always having the proper bits reported via proc_mask().	2019-02-04 05:09:16 +01:00
Willy Tarreau	bbcf2b9e0d	BUG/MINOR: threads: fix the process range of thread masks Commit 421f02e ("MINOR: threads: add a MAX_THREADS define instead of LONGBITS") used a MAX_THREADS macros to fix threads limits. However, one change was wrong as it affected the upper bound of the process loop when setting threads masks. No backport is needed.	2019-02-02 13:18:01 +01:00
Willy Tarreau	888d5678f7	BUG/MINOR: listener: always fill the source address for accepted socketpairs The source address was not set but passed down the chain to the upper layer's accept() calls. Let's initialize it like other UNIX sockets in this case. At the moment it should not have any impact since socketpairs are only usable for the master CLI. This should be backported to 1.9.	2019-01-27 21:48:29 +01:00
Willy Tarreau	c9a82e48bf	MINOR: cfgparse: make the process/thread parser support a maximum value It was hard-wired to LONGBITS, let's make it configurable depending on the context (threads, processes).	2019-01-26 13:25:14 +01:00
Willy Tarreau	76a551de2e	MINOR: config: make sure to associate the proper mux to bind and servers Currently a mux may be forced on a bind or server line by specifying the "proto" keyword. The problem is that the mux may depend on the proxy's mode, which is not known when parsing this keyword, so a wrong mux could be picked. Let's simply update the mux entry while checking its validity. We do have the name and the side, we only need to see if a better mux fits based on the proxy's mode. It also requires to remove the side check while parsing the "proto" keyword since a wrong mux could be picked. This way it becomes possible to declare multiple muxes with the same protocol names and different sides or modes.	2018-12-02 13:29:35 +01:00

1 2 3

134 Commits