haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-23 11:51:00 +01:00

Author	SHA1	Message	Date
Willy Tarreau	de4db17dee	MINOR: lists: rename some MT_LIST operations to clarify them Initially when mt_lists were added, their purpose was to be used with the scheduler, where anyone may concurrently add the same tasklet, so it sounded natural to implement a check in MT_LIST_ADD{,Q}. Later their usage was extended and MT_LIST_ADD{,Q} started to be used on situations where the element to be added was exclusively owned by the one performing the operation so a conflict was impossible. This became more obvious with the idle connections and the new macro was called MT_LIST_ADDQ_NOCHECK. But this remains confusing and at many places it's not expected that an MT_LIST_ADD could possibly fail, and worse, at some places we start by initializing it before adding (and the test is superflous) so let's rename them to something more conventional to denote the presence of the check or not: MT_LIST_ADD{,Q} : inconditional operation, the caller owns the element, and doesn't care about the element's current state (exactly like LIST_ADD) MT_LIST_TRY_ADD{,Q}: only perform the operation if the element is not already added or in the process of being added. This means that the previously "safe" MT_LIST_ADD{,Q} are not "safe" anymore. This also means that in case of backport mistakes in the future causing this to be overlooked, the slower and safer functions will still be used by default. Note that the missing unchecked MT_LIST_ADD macro was added. The rest of the code will have to be reviewed so that a number of callers of MT_LIST_TRY_ADDQ are changed to MT_LIST_ADDQ to remove the unneeded test.	2020-07-10 08:50:41 +02:00
Willy Tarreau	b2551057af	CLEANUP: include: tree-wide alphabetical sort of include files This patch fixes all the leftovers from the include cleanup campaign. There were not that many (~400 entries in ~150 files) but it was definitely worth doing it as it revealed a few duplicates.	2020-06-11 10:18:59 +02:00
Willy Tarreau	6be7849f39	REORG: include: move cfgparse.h to haproxy/cfgparse.h There's no point splitting the file in two since only cfgparse uses the types defined there. A few call places were updated and cleaned up. All of them were in C files which register keywords. There is nothing left in common/ now so this directory must not be used anymore.	2020-06-11 10:18:58 +02:00
Willy Tarreau	dfd3de8826	REORG: include: move stream.h to haproxy/stream{,-t}.h This one was not easy because it was embarking many includes with it, which other files would automatically find. At least global.h, arg.h and tools.h were identified. 93 total locations were identified, 8 additional includes had to be added. In the rare files where it was possible to finalize the sorting of includes by adjusting only one or two extra lines, it was done. But all files would need to be rechecked and cleaned up now. It was the last set of files in types/ and proto/ and these directories must not be reused anymore.	2020-06-11 10:18:58 +02:00
Willy Tarreau	aeed4a85d6	REORG: include: move log.h to haproxy/log{,-t}.h The current state of the logging is a real mess. The main problem is that almost all files include log.h just in order to have access to the alert/warning functions like ha_alert() etc, and don't care about logs. But log.h also deals with real logging as well as log-format and depends on stream.h and various other things. As such it forces a few heavy files like stream.h to be loaded early and to hide missing dependencies depending where it's loaded. Among the missing ones is syslog.h which was often automatically included resulting in no less than 3 users missing it. Among 76 users, only 5 could be removed, and probably 70 don't need the full set of dependencies. A good approach would consist in splitting that file in 3 parts: - one for error output ("errors" ?). - one for log_format processing - and one for actual logging.	2020-06-11 10:18:58 +02:00
Willy Tarreau	dcc048a14a	REORG: include: move acl.h to haproxy/acl.h{,-t}.h The files were moved almost as-is, just dropping arg-t and auth-t from acl-t but keeping arg-t in acl.h. It was useful to revisit the call places since a handful of files used to continue to include acl.h while they did not need it at all. Struct stream was only made a forward declaration since not otherwise needed.	2020-06-11 10:18:58 +02:00
Willy Tarreau	7ea393d95e	REORG: include: move connection.h to haproxy/connection{,-t}.h The type file is becoming a mess, half of it is for the proxy protocol, another good part describes conn_streams and mux ops, it would deserve being split again. At least it was reordered so that elements are easier to find, with the PP-stuff left at the end. The MAX_SEND_FD macro was moved to compat.h as it's said to be the value for Linux.	2020-06-11 10:18:58 +02:00
Willy Tarreau	cea0e1bb19	REORG: include: move task.h to haproxy/task{,-t}.h The TASK_IS_TASKLET() macro was moved to the proto file instead of the type one. The proto part was a bit reordered to remove a number of ugly forward declaration of static inline functions. About a tens of C and H files had their dependency dropped since they were not using anything from task.h.	2020-06-11 10:18:58 +02:00
Willy Tarreau	f268ee8795	REORG: include: split global.h into haproxy/global{,-t}.h global.h was one of the messiest files, it has accumulated tons of implicit dependencies and declares many globals that make almost all other file include it. It managed to silence a dependency loop between server.h and proxy.h by being well placed to pre-define the required structs, forcing struct proxy and struct server to be forward-declared in a significant number of files. It was split in to, one which is the global struct definition and the few macros and flags, and the rest containing the functions prototypes. The UNIX_MAX_PATH definition was moved to compat.h.	2020-06-11 10:18:58 +02:00
Willy Tarreau	e6ce10be85	REORG: include: move sample.h to haproxy/sample{,-t}.h This one is particularly tricky to move because everyone uses it and it depends on a lot of other types. For example it cannot include arg-t.h and must absolutely only rely on forward declarations to avoid dependency loops between vars -> sample_data -> arg. In order to address this one, it would be nice to split the sample_data part out of sample.h.	2020-06-11 10:18:58 +02:00
Willy Tarreau	213e99073b	REORG: include: move listener.h to haproxy/listener{,-t}.h stdlib and list were missing from listener.h, otherwise it was OK.	2020-06-11 10:18:58 +02:00
Willy Tarreau	f07f30c15f	REORG: include: move proto/proto_sockpair.h to haproxy/proto_sockpair.h This one didn't have any types file and was moved as-is.	2020-06-11 10:18:57 +02:00
Willy Tarreau	0f6ffd652e	REORG: include: move fd.h to haproxy/fd{,-t}.h A few includes were missing in each file. A definition of struct polled_mask was moved to fd-t.h. The MAX_POLLERS macro was moved to defaults.h Stdio used to be silently inherited from whatever path but it's needed for list_pollers() which takes a FILE* and which can thus not be forward-declared.	2020-06-11 10:18:57 +02:00
Willy Tarreau	48fbcae07c	REORG: tools: split common/standard.h into haproxy/tools{,-t}.h And also rename standard.c to tools.c. The original split between tools.h and standard.h dates from version 1.3-dev and was mostly an accident. This patch moves the files back to what they were expected to be, and takes care of not changing anything else. However this time tools.h was split between functions and types, because it contains a small number of commonly used macros and structures (e.g. name_desc) which in turn cause the massive list of includes of tools.h to conflict with the callers. They remain the ugliest files of the whole project and definitely need to be cleaned and split apart. A few types are defined there only for functions provided there, and some parts are even OS-specific and should move somewhere else, such as the symbol resolution code.	2020-06-11 10:18:57 +02:00
Willy Tarreau	2dd7c35052	REORG: include: move protocol.h to haproxy/protocol{,-t}.h The protocol.h files are pretty low in the dependency and (sadly) used by some files from common/. Almost nothing was changed except lifting a few comments.	2020-06-11 10:18:57 +02:00
Willy Tarreau	6634794992	REORG: include: move freq_ctr to haproxy/ types/freq_ctr.h was moved to haproxy/freq_ctr-t.h and proto/freq_ctr.h was moved to haproxy/freq_ctr.h. Files were updated accordingly, no other change was applied.	2020-06-11 10:18:56 +02:00
Willy Tarreau	92b4f1372e	REORG: include: move time.h from common/ to haproxy/ This one is included almost everywhere and used to rely on a few other .h that are not needed (unistd, stdlib, standard.h). It could possibly make sense to split it into multiple parts to distinguish operations performed on timers and the internal time accounting, but at this point it does not appear much important.	2020-06-11 10:18:56 +02:00
Willy Tarreau	af613e8359	CLEANUP: thread: rename __decl_hathreads() to __decl_thread() I can never figure whether it takes an "s" or not, and in the end it's better if it matches the file's naming, so let's call it "__decl_thread".	2020-06-11 10:18:56 +02:00
Willy Tarreau	853b297c9b	REORG: include: split mini-clist into haproxy/list and list-t.h Half of the users of this include only need the type definitions and not the manipulation macros nor the inline functions. Moves the various types into mini-clist-t.h makes the files cleaner. The other one had all its includes grouped at the top. A few files continued to reference it without using it and were cleaned. In addition it was about time that we'd rename that file, it's not "mini" anymore and contains a bit more than just circular lists.	2020-06-11 10:18:56 +02:00
Willy Tarreau	8d36697dee	REORG: include: move base64.h, errors.h and hash.h from common to to haproxy/ These ones do not depend on any other file. One used to include haproxy/api.h but that was solely for stddef.h.	2020-06-11 10:18:56 +02:00
Willy Tarreau	4c7e4b7738	REORG: include: update all files to use haproxy/api.h or api-t.h if needed All files that were including one of the following include files have been updated to only include haproxy/api.h or haproxy/api-t.h once instead: - common/config.h - common/compat.h - common/compiler.h - common/defaults.h - common/initcall.h - common/tools.h The choice is simple: if the file only requires type definitions, it includes api-t.h, otherwise it includes the full api.h. In addition, in these files, explicit includes for inttypes.h and limits.h were dropped since these are now covered by api.h and api-t.h. No other change was performed, given that this patch is large and affects 201 files. At least one (tools.h) was already freestanding and didn't get the new one added.	2020-06-11 10:18:42 +02:00
Christopher Faulet	784063eeb2	MINOR: config: Don't dump keywords if argument is NULL Helper functions are used to dump bind, server or filter keywords. These functions are used to report errors during the configuration parsing. To have a coherent API, these functions are now prepared to handle a null pointer as argument. If so, no action is performed and functions immediately return. This patch should fix the issue #631. It is not a bug. There is no reason to backport it.	2020-05-18 18:30:06 +02:00
Willy Tarreau	8d2c98b76c	BUG/MEDIUM: listener: mark the thread as not stuck inside the loop We tried hard to make sure we report threads as not stuck at various crucial places, but one of them is special, it's the listener_accept() function. The reason it is special is because it will loop a certain number of times (default: 64) accepting incoming connections, allocating resources, dispatching them to other threads or running L4 rules on them, and while all of this is supposed to be extremely fast, when the machine slows down or runs low on memory, the expectedly small delays in malloc() caused by contention with other threads can quickly accumulate and suddenly become critical to the point of triggering the watchdog. Furthermore, it is technically possible to trigger this by pure configuration by setting a huge tune.maxaccept value, which should not be possible. Given that each operation isn't related to the same task but to a different one each time, it is appropriate to mark the thread as not stuck each time it accepts new work that possibly gets dispatched to other threads which execute it. This looks like this could be a good reason for the issue reported in issue #388. This fix must be backported to 2.0.	2020-05-01 11:41:36 +02:00
Ilya Shipitsin	6fb0f2148f	CLEANUP: assorted typo fixes in the code and comments This is sixth iteration of typo fixes	2020-04-02 16:25:45 +02:00
Jerome Magnin	eb421b2fe0	MINOR: listener: add so_name sample fetch Add a sample fetch for the name of a bind. This can be useful to take decisions when PROXY protocol is used and we can't rely on dst, such as the sample config below. defaults mode http listen bar bind 127.0.0.1:1111 server s1 127.0.1.1:1234 send-proxy listen foo bind 127.0.1.1:1234 name foo accept-proxy http-request return status 200 hdr dst %[dst] if { dst 127.0.1.1 }	2020-03-29 05:47:29 +02:00
Willy Tarreau	a7da5e8dd0	BUG/MINOR: listener/mq: do not dispatch connections to remote threads when stopping When stopping there is a risk that other threads are already in the process of stopping, so let's not add new work in their queue and instead keep the incoming connection local. This should be backported to 2.1 and 2.0.	2020-03-12 19:10:29 +01:00
Willy Tarreau	618ac6ea52	CLEANUP: drop support for USE_MY_ACCEPT4 The accept4() syscall has been present for a while now, there is no more reason for maintaining our own arch-specific syscall implementation for systems lacking it in libc but having it in the kernel.	2020-03-10 07:02:46 +01:00
Willy Tarreau	50b659476c	BUG/MEDIUM: listener: only consider running threads when resuming listeners In bug #495 we found that it is possible to resume a listener on an inexistent thread. This happens when a bind's thread_mask contains bits out of the active threads mask, such as when using "1/odd" or "1/even". The thread_mask was used as-is to pick a thread number to re-enable the listener, and given that the highest number is used, 1/odd or 1/even can produce quite high thread numbers and crash the process by queuing some entries into non-existent lists. This bug is an incomplete fix of commit 413e926ba ("BUG/MAJOR: listener: fix thread safety in resume_listener()") though it will only trigger if some bind lines are explicitly bound to thread numbers higher than the thread count. The fix must be backported to all branches having the fix above (as far as 1.8, though the code is different there, see the commit message in 1.8 for changes). There are a few other places where bind_thread is used without enforcing all_thread_mask, namely when doing fd_insert() while creating listeners. It seems harmless but would probably deserve another fix.	2020-02-12 10:21:33 +01:00
Willy Tarreau	eeea8082a8	BUG/MAJOR: listener: do not schedule a task-less proxy Apparently seamingless commit 0591bf7deb ("MINOR: listener: make the wait paths cleaner and more reliable") caused a nasty regression and revealed a rare race that hits regtest stickiness/lb-services.vtc about 4% of the times for 8 threads. The problem is that when a multi-threaded listener wakes up on an incoming connection, several threads can receive the event, especially when idle. And all of them will race to accept the connections in parallel, adjusting the listener's nbconn and proxy's feconn until one reaches the proxy's limit and declines. At this step the changes are cancelled, the listener is marked "limited", and when the threads exit the function, one of them will unlimit the listener/proxy again so that it can accept incoming connections again. The problem happens when many threads connect to a small peers section because its maxconn is very limited (typically 6 for 2 peers), and it's sometimes possible for enough competing threads to hit the limit and one of them will limit the listener and queue the proxy's task... except that peers do not initialize their proxy task since they do not use rate limiting. Thus the process crashes when doing task_schedule(p->task). Prior to the cleanup patch above, this didn't happen because the error path that was dedicated to only limiting the listener did not call task_schedule(p->task). Given that the proxy's task is optional, and that the expire value passed there is always TICK_ETERNITY, it's sufficient and reasonable to avoid calling this task_schedule() when expire is not set. And for long term safety we can also avoid to do it when the task is not set. A first fix consisted in allocating a task for the peers proxies but it's never used and would eat resources for reason. No backport is needed as this commit was only merged into 2.2.	2020-01-08 19:39:09 +01:00
Willy Tarreau	cdcba115b8	BUG/MINOR: listener: do not immediately resume on transient error The listener supports a "transient error" situation, which corresponds to those situations where accept fails badly but poll() reports an event. This happens for example when a listener is paused, or on out of FD. The same mechanism is used when facing a maxconn or maxsessrate limitation. When this happens, the listener is disabled for up to 100ms and put back into the global listener queue so that it automatically wakes up again as soon as the conditions change from an existing connection releasing one resource, or the system recovers from a transient issue. The listener_accept() function has a bug in its exit path causing a freshly limited listener to be immediately enabled again because all the conditions are met (connection count < max). It doesn't take into account the fact that the listener might have been queued and must first wait for the timeout to expire before doing so. The impact is that upon certain errors, the faulty process will busy loop on the accept code without sleeping. This is the scenario reported and diagnosed by @hedong0411 in issue #382. This commit fixes it by verifying that the global queue's delay is at least expired before deciding to resume the listener. Another approach could consist in having an extra state like LI_DELAY for situations where only a delay is acceptable, but this would probably not bring anything except more complex code. This issue was introduced with the lock-free listener accept code (commits 3f0d02b and 82c9789a) that were backported to 1.8.20+ and 1.9.7+, so this fix must be backported to the relevant branches.	2019-12-11 15:06:30 +01:00
Willy Tarreau	a1d97f88e0	REORG: listener: move the global listener queue code to listener.c The global listener queue code and declarations were still lying in haproxy.c while not needed there anymore at all. This complicates the code for no reason. As a result, the global_listener_queue_task and the global_listener_queue were made static.	2019-12-10 14:16:03 +01:00
Willy Tarreau	241797a3fc	MINOR: listener: split dequeue_all_listener() in two We use it half times for the global_listener_queue and half times for a proxy's queue and this requires the callers to take care of these. Let's split it in two versions, the current one working only on the global queue and another one dedicated to proxies for the per-proxy queues. This cleans up quite a bit of code.	2019-12-10 14:14:09 +01:00
Willy Tarreau	0591bf7deb	MINOR: listener: make the wait paths cleaner and more reliable In listener_accept() there are several situations where we have to wait for an event or a delay. These ones all implement their own call to limit_listener() and the associated task_schedule(). In addition to being ugly and confusing, one expire date computation is even wrong as it doesn't take in account the fact that we're using threads and that the value might change in the middle. Fortunately task_schedule() gets it right for us. This patch creates two jump locations, one for the global queue and one for the proxy queue, allowing the rest of the code to only compute the expire delay and jump to the right location.	2019-12-10 12:04:27 +01:00
Willy Tarreau	92079934a9	BUG/MEDIUM: listener/threads: fix a remaining race in the listener's accept() Recent fix 4c044e274c ("BUG/MEDIUM: listener/thread: fix a race when pausing a listener") is insufficient and moves the race slightly farther. What now happens is that if we're limiting a listener due to a transient error such as an accept() error for example, or because the proxy's maxconn was reached, another thread might in the mean time have switched again to LI_READY and at the end of the function we'll disable polling on this FD, resulting in a listener that never accepts anything anymore. It can more easily happen when sending SIGTTOU/SIGTTIN to temporarily pause the listeners to let another process bind next to them. What this patch does instead is to move all enable/disable operations at the end of the function and condition them to the state. The listener's state is checked under the lock and the FD's polling state adjusted accordingly so that the listener's state and the FD always remain 100% synchronized. It was verified with 16 threads that the cost of taking that lock is not measurable so that's fine. This should be backported to the same branches the patch above is backported to.	2019-12-10 10:43:31 +01:00
Willy Tarreau	20aeb1c7cd	BUG/MINOR: listener: also clear the error flag on a paused listener When accept() fails because a listener is temporarily paused, the FD might have both FD_POLL_HUP and FD_POLL_ERR bits set. While we do not exploit FD_POLL_ERR here it's better to clear it because it is reported on "show fd" and is confusing. This may be backported to all versions.	2019-12-10 10:43:31 +01:00
Willy Tarreau	7cdeb61701	BUG/MINOR: listener/threads: always use atomic ops to clear the FD events There was a leftover of the single-threaded era when removing the FD_POLL_HUP flag from the listeners. By not using an atomic operation to clear the flag, another thread acting on the same listener might have lost some events, though this would have resulted in that thread to reprocess them immediately on the next loop pass. This should be backported as far as 1.8.	2019-12-10 10:43:31 +01:00
Willy Tarreau	4c044e274c	BUG/MEDIUM: listener/thread: fix a race when pausing a listener There exists a race in the listener code where a thread might disable receipt on a listener's FD then turn it to LI_PAUSED while at the same time another one faces EAGAIN on accept() and enables it again via fd_cant_recv(). The result is that the FD is in LI_PAUSED state with its polling still enabled. listener_accept() does not do anything then and doesn't disable the FD either, resulting in a thread eating all the CPU as reported in issue #358. A solution would be to take the listener's lock to perform the fd_cant_recv() call and do it only if the FD is still in LI_READY state, but this would be totally overkill while in practice the issue only happens during shutdown. Instead what is done here is that when leaving we recheck the state and disable polling if the listener is not in LI_READY state, which never happens except when being limited. In the worst case there could be one extra check per thread for the time required to converge, which is absolutely nothing. This fix was successfully tested, and should be backported to all versions using the lock-free listeners, which means all those containing commit 3f0d02bb ("MAJOR: listener: do not hold the listener lock in listener_accept()"), hence 2.1, 2.0, 1.9.7+, 1.8.20+.	2019-12-05 07:40:32 +01:00
Willy Tarreau	93604edb65	BUG/MEDIUM: listeners: always pause a listener on out-of-resource condition A corner case was opened in the listener_accept() code by commit 3f0d02bbc2 ("MAJOR: listener: do not hold the listener lock in listener_accept()"). The issue is when one listener (or a group of) managed to eat all the proxy's or all the process's maxconn, and another listener tries to accept a new socket. This results in the atomic increment to detect the excess connection count and immediately abort, without pausing the listener, thus the call is immediately performed again. This doesn't happen when the test is run on a single listener because this listener got limited when crossing the limit. But with 2 or more listeners, we don't have this luxury. The solution consists in limiting the listener as soon as we have to decline accepting an incoming connection. This means that the listener will not be marked full yet if it gets the exact connection count but this is not a problem in practice since all other listeners will only be marked full after their first attempt. Thus from now on, a listener is only full once it has already failed taking an incoming connection. This bug was definitely responsible for the unreproduceable occasional reports of high CPU usage showing epoll_wait() returning immediately without accepting an incoming connection, like in bug #129. This fix must be backported to 1.9 and 1.8.	2019-11-15 10:34:51 +01:00
Willy Tarreau	2bd65a781e	OPTIM: listeners: use tasklets for the multi-queue rings Now that we can wake up a remote thread's tasklet, it's way more interesting to use a tasklet than a task in the accept queue, as it will avoid passing through all the scheduler. Just doing this increases the accept rate by about 4%, overall recovering the slight loss introduced by the tasklet change. In addition it makes sure that even a heavily loaded scheduler (e.g. many very fast checks) will not delay a connection accept.	2019-09-24 06:57:32 +02:00
Olivier Houchard	859dc80f94	MEDIUM: list: Separate "locked" list from regular list. Instead of using the same type for regular linked lists and "autolocked" linked lists, use a separate type, "struct mt_list", for the autolocked one, and introduce a set of macros, similar to the LIST_* macros, with the MT_ prefix. When we use the same entry for both regular list and autolocked list, as is done for the "list" field in struct connection, we know have to explicitely cast it to struct mt_list when using MT_ macros.	2019-09-23 18:16:08 +02:00
Christopher Faulet	ad6c2eac28	BUG/MINOR: listener: Fix a possible null pointer dereference It seems to be possible to have no frontend for a listener. A test was missing before dereferencing it at the end of the function listener_accept(). This patch fixes the issue #264. It must be backported to 2.0 and 1.9.	2019-09-10 10:29:54 +02:00
Willy Tarreau	6ee9f8df3b	BUG/MEDIUM: listener/threads: fix an AB/BA locking issue in delete_listener() The delete_listener() function takes the listener's lock before taking the proto_lock, which is contrary to what other functions do, possibly causing an AB/BA deadlock. In practice the two only places where both are taken are during protocol_enable_all() and delete_listener(), the former being used during startup and the latter during stop. In practice during reload floods, it is technically possible for a thread to be initializing the listeners while another one is stopping. While this is too hard to trigger on 2.0 and above due to the synchronization of all threads during startup, it's reasonably easy to do in 1.9 by having hundreds of listeners, starting 64 threads and flooding them with reloads like this : $ while usleep 50000; do killall -USR2 haproxy; done Usually in less than a minute, all threads will be deadlocked. The fix consists in always taking the proto_lock before the listener lock. It seems to be the only place where these two locks were reversed. This fix needs to be backported to 2.0, 1.9, and 1.8.	2019-08-26 11:07:09 +02:00
Willy Tarreau	daacf36645	BUG/MEDIUM: protocols: add a global lock for the init/deinit stuff Dragan Dosen found that the listeners lock is not sufficient to protect the listeners list when proxies are stopping because the listeners are also unlinked from the protocol list, and under certain situations like bombing with soft-stop signals or shutting down many frontends in parallel from multiple CLI connections, it could be possible to provoke multiple instances of delete_listener() to be called in parallel for different listeners, thus corrupting the protocol lists. Such operations are pretty rare, they are performed once per proxy upon startup and once per proxy on shut down. Thus there is no point trying to optimize anything and we can use a global lock to protect the protocol lists during these manipulations. This fix (or a variant) will have to be backported as far as 1.8.	2019-07-24 16:45:02 +02:00
Willy Tarreau	f2cb169487	BUG/MAJOR: listener: fix thread safety in resume_listener() resume_listener() can be called from a thread not part of the listener's mask after a curr_conn has gone lower than a proxy's or the process' limit. This results in fd_may_recv() being called unlocked if the listener is bound to only one thread, and quickly locks up. This patch solves this by creating a per-thread work_list dedicated to listeners, and modifying resume_listener() so that it bounces the listener to one of its owning thread's work_list and waking it up. This thread will then call resume_listener() again and will perform the operation on the file descriptor itself. It is important to do it this way so that the listener's state cannot be modified while the listener is being moved, otherwise multiple threads can take conflicting decisions and the listener could be put back into the global queue if the listener was used at the same time. It seems like a slightly simpler approach would be possible if the locked list API would provide the ability to return a locked element. In this case the listener would be immediately requeued in dequeue_all_listeners() without having to go through resume_listener() with its associated lock. This fix must be backported to all versions having the lock-less accept loop, which is as far as 1.8 since deadlock fixes involving this feature had to be backported there. It is expected that the code should not differ too much there. However, previous commit "MINOR: task: introduce work lists" will be needed as well and should not present difficulties either. For 1.8, the commits introducing thread_mask() and LIST_ADDED() will be needed as well, either backporting my_flsl() or switching to my_ffsl() will be OK, and some changes will have to be performed so that the init function is properly called (and maybe the deinit one can be dropped). In order to test for the fix, simply set up a multi-threaded frontend with multiple bind lines each attached to a single thread (reproduced with 16 threads here), set up a very low maxconn value on the frontend, and inject heavy traffic on all listeners in parallel with slightly more connections than the configured limit ( typically +20%) so that it flips very frequently. If the bug is still there, at some point (5-20 seconds) the traffic will go much lower or even stop, either with spinning threads or not.	2019-07-12 09:07:48 +02:00
Christopher Faulet	102854cbba	BUG/MEDIUM: listener: Fix how unlimited number of consecutive accepts is handled There is a bug when global.tune.maxaccept is set to -1 (no limit). It is pretty visible with one process (nbproc sets to 1). The functions listener_accept() and accept_queue_process() don't expect to handle negative maxaccept values. So instead of accepting incoming connections without any limit, none are never accepted and HAProxy loop infinitly in the scheduler. When there are 2 or more processes, the bug is a bit more subtile. The limit for a listener is set to 1. So only one connection is accepted at a time by a given listener. This happens because the listener's maxaccept value is an unsigned integer. In check_config_validity(), it is first set to UINT_MAX (-1 casted in an unsigned integer), and then some calculations on it leads to an integer overflow. To fix the bug, the listener's maxaccept value is now a signed integer. So, if a negative value is set for global.tune.maxaccept, we keep it untouched for the listener and no calculation is made on it. Then, in the listener code, this signed value is casted to a unsigned one. It simplifies all tests instead of dealing with negative values. So, it limits the number of connections accepted at a time to UINT_MAX at most. But, honestly, it not an issue. This patch must be backported to 1.9 and 1.8.	2019-04-30 15:28:29 +02:00
Willy Tarreau	85d0424b20	BUG/MINOR: listener/mq: correctly scan all bound threads under low load When iterating on the CLI using "show activity" and no other load, it was visible that the last thread was always skipped. This was caused by the way the thread bits were walking : t1 was updated after t2 to make sure it never equals t2 (thus it skips t2), and in case of a tie we choose t1. This results in the chosen thread never to equal t2 unless the other ones already have one connection. In addition to this, t2 was recalulated upon each pass due to the fact that only the 31th bit was looked at instead of looking at the t2'th bit. This patch fixes this by updating t2 after t1 so that t1 is free to walk over all positions under equal load. No measurable performance gains are expected from this though, but it at least removes one strange indicator which could lead to some suspicion. No backport is needed.	2019-04-16 18:09:13 +02:00
Willy Tarreau	64a9c05f37	MINOR: cli/listener: report the number of accepts on "show activity" The "show activity" command reports the number of incoming connections dispatched per thread but doesn't report the number of connections received by each thread. It is important to be able to monitor this value as it can show that for whatever reason a smaller set of threads is receiving the connections and dispatching them to all other ones.	2019-04-12 15:54:15 +02:00
Willy Tarreau	0d858446b6	BUG/MINOR: listener: renice the accept ring processing task It is not acceptable that the accept queues are handled with a normal priority since they are supposed to quickly dispatch the incoming traffic, resulting in tasks which will have their respective nice values and place in the queue. Let's renice the accept ring tasks to -1024. No backport is needed, this is strictly 2.0.	2019-04-12 15:54:03 +02:00
David Carlier	5671662f08	BUILD/MINOR: listener: Silent a few signedness warnings. Silenting couple of warnings related to signedness, due to a mismatch of signed and unsigned ints with l->nbconn, actconn and p->feconn.	2019-03-27 17:37:44 +01:00
Willy Tarreau	57cb506df8	BUILD: listener: shut up a build warning when threads are disabled We get this with __decl_hathreads due to the lone semi-colon, let's move it at the end of the innermost declaration : src/listener.c: In function 'listener_accept': src/listener.c:601:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]	2019-03-15 17:17:33 +01:00

1 2 3 4

161 Commits