haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-10 00:57:02 +02:00

Author	SHA1	Message	Date
Olivier Houchard	f98a8c317e	BUG/MEDIUM: fd: don't wait for tmask to stabilize if we're not in it. In fd_update_events(), we loop until there's no bit in the running_mask that is not in the thread_mask. Problem is, the thread sets its running_mask bit before that loop, and so if 2 threads do the same, and a 3rd one just closes the FD and sets the thread_mask to 0, then running_mask will always be non-zero, and we will loop forever. This is trivial to reproduce when using a DNS resolver that will just answer "port unreachable", but could theoretically happen with other types of file descriptors too. To fix that, just don't bother looping if we're no longer in the thread_mask, if that happens we know we won't have to take care of the FD, anyway. This should be backported to 2.7, 2.6 and 2.5.	2023-04-13 18:04:46 +02:00
Amaury Denoyelle	53fc98c3bc	MINOR: fd: implement fd_migrate_on() to migrate on a non-local thread fd_migrate_on() can be used to migrate an existing FD to any thread, even one belonging to a different group from the current one and from the caller's. All that is needed is to make sure the FD is still valid when the operation is performed (which is the case when such operations happen). This is potentially slightly expensive since it locks the tgid during the delicate operation, but it is normally performed only from an owning thread to offer the FD to another one (e.g. reassign a better thread upon accept()).	2023-04-13 16:57:51 +02:00
Willy Tarreau	cd8914bc52	BUG/MAJOR: fd/threads: close a race on closing connections after takeover As mentioned in commit `237e6a0d6` ("BUG/MAJOR: fd/thread: fix race between updates and closing FD"), a race was found during stress tests involving heavy backend connection reuse with many competing closes. Here the problem is complex. The analysis in commit `f69fea64e` ("MAJOR: fd: get rid of the DWCAS when setting the running_mask") that removed the DWCAS in 2.5 overlooked a few races. First, a takeover from thread1 could happen just after fd_update_events() in thread2 validates it holds the tmask bit in the CAS loop. Since thread1 releases running_mask after the operation, thread2 will succeed the CAS and both will believe the FD is theirs. This does explain the occasional crashes seen with h1_io_cb() being called on a bad context, or sock_conn_iocb() seeing conn->subs vanish after checking it. This issue can be addressed using a DWCAS in both fd_takeover() and fd_update_events() as it was before the patch above but this is not portable to all archs and is not easy to adapt for those lacking it, due to some operations still happening only on individual masks after the thread groups were added. Second, the checks after fd_clr_running() for the current thread being the last one is not sufficient: at the exact moment the operation completes, another thread may also set and drop the running bit and see itself as alone, and both can call _fd_close_orphan() in parallel. In order to prevent this from happening, we cannot rely on the absence of others, we need an explicit flag indicating that the FD must be closed. One approach that was attempted consisted in playing with the thread_mask but that was not reliable since it could still match between the late deletion and the early insertion that follows. Instead, a new FD flag was added, FD_MUST_CLOSE, that exactly indicates that the call to _fd_delete_orphan() must be done. It is set by fd_delete(), and atomically cleared by the first one which checks it, and which is the only one to call _fd_delete_orphan(). With both points addressed, there's no more visible race left: - takeover() only happens under the connection list's lock and cannot compete with fd_delete() since fd_delete() must first remove the connection from the list before deleting the FD. That's also why it doesn't need to call _fd_delete_orphan() when dropping its running bit. - takeover() sets its running bit then atomically replaces the thread mask, so that until that's done, it doesn't validate the condition to end the synchonization loop in fd_update_events(). Once it's OK, the previous thread's bit is lost, and this is checked for in fd_update_events() - fd_update_events() can compete with fd_delete() at various places which are explained above. Since fd_delete() clears the thread mask as after setting its running bit and after setting the FD_MUST_CLOSE bit, the synchronization loop guarantees that the thread mask is seen before going further, and that once it's seen, the FD_MUST_CLOSE flag is already present. - fd_delete() may start while fd_update_events() has already started, but fd_delete() must hold a bit in thread_mask before starting, and that is checked by the first test in fd_update_events() before setting the running_mask. - the poller's _update_fd() will not compete against _fd_delete_orphan() nor fd_insert() thanks to the fd_grab_tgid() that's always done before updating the polled_mask, and guarantees that we never pretend that a polled_mask has a bit before the FD is added. The issue is very hard to reproduce and is extremely time-sensitive. Some tests were required with a 1-ms timeout with request rates closely matching 1 kHz per server, though certain tests sometimes benefitted from saturation. It was found that adding the following slowdown at a few key places helped a lot and managed to trigger the bug in 0.5 to 5 seconds instead of tens of minutes on a 20-thread setup: { volatile int i = 10000; while (i--); } Particularly, placing it at key places where only one of running_mask or thread_mask is set and not the other one yet (e.g. after the synchronization loop in fd_update_events or after dropping the running bit) did yield great results. Many thanks to Olivier Houchard for this expert help analysing these races and reviewing candidate fixes. The patch must be backported to 2.5. Note that 2.6 does not have tgid in FDs, and that it requires a change of output on fd_clr_running() as we need the previous bit. This is provided by carefully backporting commit `d6e1987612` ("MINOR: fd: make fd_clr_running() return the previous value instead"). Tests have shown that the lack of tgid is a showstopper for 2.6 and that unless a better workaround is found, it could still be preferable to backport the minimum pieces required for fd_grab_tgid() to 2.6 so that it stays stable long.	2023-03-09 14:01:48 +01:00
Willy Tarreau	237e6a0d65	BUG/MAJOR: fd/thread: fix race between updates and closing FD While running some L7 retries tests, Christopher and I stumbled upon a very strange behavior showing some occasional server timeouts when the server closes keep-alive connections quickly. The issue can be reproduced with the following config: global expose-experimental-directives #tune.fd.edge-triggered on # can speed up the issue defaults mode http timeout client 5s timeout server 10s timeout connect 2s listen f bind :8001 http-reuse always retry-on all-retryable-errors server next 127.0.0.1:8002 frontend b bind :8002 timeout http-keep-alive 1 # one ms redirect location / Sending fast requests without reusing the client connection on port 8001 with a single connection and at least 3 threads on haproxy occasionally shows some glitches pauses (below with timeout server 2s): $ taskset -c 2,3 h1load -e -t 1 -r 1 -c 1 http://127.0.0.1:8001/ # time conns tot_conn tot_req tot_bytes err cps rps bps ttfb 1 1 9794 9793 959714 0 9k79 9k79 7M67 42.94u 2 1 9794 9793 959714 0 0.00 0.00 0.00 - 3 1 9794 9793 959714 0 0.00 0.00 0.00 - 4 0 16015 16015 1569470 0 6k22 6k22 4M87 522.9u 5 0 18657 18656 1828190 2 2k63 2k63 2M06 39.22u If this doesn't happen, limiting to a request rate close to 1/timeout may help. What is happening is that after several migrations, a late report via fd_update_events() may detect that the thread is not welcome, and will want to program an update so that the current thread's poller disables its polling on it. It is allowed to do so because it used fd_grab_tgid(). But what if _fd_delete_orphan() was just starting to be called and already reset the update_mask ? We'll end up with a bit present in the update mask, then _fd_delete_orphan() resets the tgid, which will prevent the poller from consuming that update. The update is not needed anymore since the FD was closed, but in this case nobody will clear this bit until the same FD is reused again and cleared. And as long as the thread's bit remains in the update_mask, no new updates will be programmed for the next use of this FD on the same thread since due to the bit being present, fd_nbupdt will not be changed. This is what is causing this timeout. The fix consists in making sure _fd_delete_orphan() waits for the occasional watchers to leave, and to do this before clearing the update_mask. This will be either fd_update_events() trying to check its thread_mask, or the poller checking its updates, so that's pretty short. But it definitely closes this race. This fix is needed since the introduction of fd_grab_tgid(), hence 2.7. Note that while testing the fix, another related issue concerning the atomicity of running_mask vs thread_mask popped up and will have to be fixed till 2.5 as part of another patch. It may make the tests for this fix occasionally tigger a few BUG_ON() or face a null conn->subs in sock_conn_iocb(), though these ones are much more difficult to trigger. This is not caused by this fix.	2023-03-07 07:09:59 +01:00
Willy Tarreau	061754b249	BUG/MEDIUM: fd: make fd_delete() support being called from a different group There's currently a problem affecting thread groups. Stopping a listener from a different group than the one that runs this listener will trigger the BUG_ON() in fd_delete(). This typically happens by issuing "disable frontend f" on the CLI for the following config since the CLI runs on group 1: global nbthread 2 thread-groups 2 stats socket /tmp/sock1 level admin frontend f mode http bind abns@frt-sock thread 2 This happens because abns sockets cannot be suspended so here this requires a full stop. A first approach would consist in isolating the caller during such rare operations but it turns out that fd_delete() is not robust against even such calling conditions, because it uses its own thread mask with an FD that may be in a different group, and even though the threads would be isolated and running_mask should be zero, we must not mix thread masks from different groups like this. A better solution consists in replacing the bug condition detection with a self-protection. After all it's not trivial to figure all likely call places, and forcing upper layers to protect the code is not clean if we can do it at the bottom. Thus this is what is being done now. We detect a thread group mismatch, and if so, we forcefully isolate ourselves and entirely clean the socket. This has the merit of being much more robust and easier to use (and harder to misuse). Given that such operations are very rare (actually when they happen a crash follows), it's not a problem to waste some time isolating the caller there. This must be backported to 2.7, along with this previous patch: BUG/MINOR: fd: used the update list from the fd's group instead of tgid	2023-02-27 19:26:42 +01:00
Willy Tarreau	c0f6f5755b	BUG/MINOR: fd: used the update list from the fd's group instead of tgid In _fd_delete_orphan() we try to remove the FD from its update list which is supposed to be the current thread group's. However the function might be called from another group during stopping or under isolation, so FD is not queued in the current group's update list but in its own group's list. Let's retrieve the group from the FD instead of using tgid. This should have no impact on existing code since there is no code path calling fd_delete() under thread isolation for now, and other cases are blocked in fd_delete(). This must be backported to 2.7.	2023-02-27 19:26:41 +01:00
Aurelien DARRAGON	e51891a01d	BUG/MEDIUM: fd: avoid infinite loops in fd_add_to_fd_list and fd_rm_from_fd_list With `4d9888c` ("CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros") some "volatile" keywords were dropped at various assignment places in fd's code. In fd_add_to_fd_list() and fd_add_to_fd_list(), because of the absence of the "volatile" keyword: the compiler was able to perform some code optimizations that prevented prev and next variables from being reloaded between locking attempts (goto loop). The result was that fd_add_to_fd_list() and fd_rm_from_fd_list() could enter in infinite loops, preventing other threads from working further and ultimately resulting in the watchdog being triggered as described in GH #2011. To fix this, we made sure to re-audit `4d9888c` in order to restore the required memory barriers / compilers hints to prevent the compiler from mis-optimizing the code around the fd's locks. That is: using atomic loads to fetch the prev and next values, and restoring the "volatile" cast for cur_list.ent variable assignment in fd_rm_from_fd_list() Big thanks to @xanaxalan for his help and patience and to @wtarreau for his findings and explanations in regard to compiler's optimizations. This must be backported in 2.7 with `4d9888c` ("CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros")	2023-02-27 16:55:56 +01:00
Willy Tarreau	b2f38c13d1	BUG/MINOR: thread: always reload threads_enabled in loops A few loops waiting for threads to synchronize such as thread_isolate() rightfully filter the thread masks via the threads_enabled field that contains the list of enabled threads. However, it doesn't use an atomic load on it. Before 2.7, the equivalent variables were marked as volatile and were always reloaded. In 2.7 they're fields in ha_tgroup_ctx[], and the risk that the compiler keeps them in a register inside a loop is not null at all. In practice when ha_thread_relax() calls sched_yield() or an x86 PAUSE instruction, it could be verified that the variable is always reloaded. If these are avoided (e.g. architecture providing neither solution), it's visible in asm code that the variables are not reloaded. In this case, if a thread exists just between the moment the two values are read, the loop could spin forever. This patch adds the required _HA_ATOMIC_LOAD() on the relevant threads_enabled fields. It must be backported to 2.7.	2023-01-19 19:22:17 +01:00
Willy Tarreau	ad90110338	BUG/MEDIUM: fd/threads: fix again incorrect thread selection in wakeup broadcast Commit `c1640f79f` ("BUG/MEDIUM: fd/threads: fix incorrect thread selection in wakeup broadcast") fixed an incorrect range being used to pick a thread when broadcasting a wakeup for a foreign thread, but the selection was still wrong as the number of threads and their mask was taken from the current thread instead of the target thread. In addition, the code dealing with the wakeup of a thread from the same group was still relying on MAX_THREADS instead of tg->count. This could theoretically cause random crashes with more than one thread group though this was never encountered. This needs to be backported to 2.7.	2023-01-19 19:22:17 +01:00
Willy Tarreau	80ff10c81d	BUG/MINOR: fd: avoid bad tgid assertion in fd_delete() from deinit() In 2.7, commit `0dc1cc93b` ("MAJOR: fd: grab the tgid before manipulating running") added a check to make sure we never try to delete an FD from the wrong thread group. It already handles the specific case of an isolated thread (e.g. stop a listener from the CLI) but forgot to take into account the deinit() code iterating over all idle server connections to close them. This results in the crash below during deinit() if thread groups are enabled and idle connections exist on a thread group higher than 1. [WARNING] (15711) : Proxy decrypt stopped (cumulated conns: FE: 64, BE: 374511). [WARNING] (15711) : Proxy stats stopped (cumulated conns: FE: 0, BE: 0). [WARNING] (15711) : Proxy GLOBAL stopped (cumulated conns: FE: 0, BE: 0). FATAL: bug condition "fd_tgid(fd) != ti->tgid && !thread_isolated()" matched at src/fd.c:369 call trace(11): \| 0x4a6060 [c6 04 25 01 00 00 00 00]: main-0x1d60 \| 0x67fcc6 [c7 43 68 fd ad de fd 5b]: sock_conn_ctrl_close+0x16/0x1f \| 0x59e6f5 [48 89 ef e8 83 65 11 00]: main+0xf6935 \| 0x60ad16 [48 8b 1b 48 81 fb a0 91]: free_proxy+0x716/0xb35 \| 0x62750e [48 85 db 74 35 48 89 dd]: deinit+0xbe/0x87a \| 0x627ce2 [89 ef e8 97 76 e7 ff 0f]: deinit_and_exit+0x12/0x19 \| 0x4a9694 [bf e6 ff 9d 00 44 89 6c]: main+0x18d4/0x2c1a There's no harm though since all traffic already ended. This must be backported to 2.7.	2023-01-05 18:06:58 +01:00
Willy Tarreau	922a907926	MINOR: fd: add a new function to only raise RLIMIT_NOFILE In issue #1866 an issue was reported under docker, by which a user cannot lower the number of FD needed. It looks like a restriction imposed in this environment, but it results in an error while it ought not have to in the case of shrinking. This patch adds a new function raise_rlim_nofile() that takes the desired new setting, compares it to the current one, and only calls setrlimit() if one of the values in the new setting is larger than the older one. As such it will continue to emit warnings and errors in case of failure to raise the limit but will never shrink it. This patch is only preliminary to another one, but will have to be backported where relevant (likely only 2.6).	2022-10-04 08:38:47 +02:00
Willy Tarreau	85af760704	BUILD: fd: fix a build warning on the DWCAS Ilya reported in issue #1816 a build warning on armhf (promoted to error here since -Werror): src/fd.c: In function fd_rm_from_fd_list: src/fd.c:209:87: error: passing argument 3 of __ha_cas_dw discards volatile qualifier from pointer target type [-Werror=discarded-array-qualifiers] 209 \| unlikely(!_HA_ATOMIC_DWCAS(((long )&fdtab[fd].update), (uint32_t )&cur_list.u32, &next_list.u32)) \| ^~~~~~~~~~~~~~ This happens only on such an architecture because the DWCAS requires the pointer not the value, and gcc seems to be needlessly picky about reading a const from a volatile! This may safely be backported to older versions.	2022-09-17 11:20:44 +02:00
Willy Tarreau	e6ca435c04	BUG/MEDIUM: poller: use fd_delete() to release the poller pipes The poller pipes needed to communicate between multiple threads are allocated in init_pollers_per_thread() and released in deinit_pollers_per_thread(). The former adds them via fd_insert() so that they are known, but the former only closes them using a regular close(). This asymmetry represents a problem, because we have in the fdtab[] an entry for something that may disappear when one thread leaves, and since these FD numbers are very low, there is a very high likelihood that they are immediately reassigned to another thread trying to connect() to a server or just sending a health check. In this case, the other thread is going to fd_insert() the fd and the recently added consistency checks will notive that ->owner is not NULL and will crash. We just need to use fd_delete() here to match fd_insert(). Note that this test was added in 2.7-dev2 by commit `36d9097cf` ("MINOR: fd: Add BUG_ON checks on fd_insert()") which was backported to 2.4 as a safety measure (since it allowed to catch particularly serious issues). The patch in itself isn't wrong, it just revealed a long-dormant bug (been there since 1.9-dev1, 4 years ago). As such the current patch needs to be backported wherever the commit above is backported. Many thanks to Christian Ruppert for providing detailed traces in github issue #1807 and Cedric Paillet for bringing his complementary analysis that helped to understand the required conditions for this issue to happen (fast health checks @100ms + randomly long connections ~7s + fast reloads every second + hard-stop-after 5s were necessary on the dev's machine to trigger it from time to time).	2022-08-10 17:25:23 +02:00
Willy Tarreau	b983145837	BUG/MINOR: fd: always remove late updates when freeing fd_updt[] Christopher found that since commit `8e2c0fa8e` ("MINOR: fd: delete unused updates on close()") we may crash in a late stop due to an fd_delete() in the main thread performed after all threads have deleted the fd_updt[] array. Prior to that commit that didn't happen because we didn't touch the updates on this path, but now it may happen. We don't care about these ones anyway since the poller is stopped, so let's just wipe them by resetting their counter before freeing the array. No backport is needed as this is only 2.7.	2022-07-26 19:06:17 +02:00
Willy Tarreau	c1640f79fe	BUG/MEDIUM: fd/threads: fix incorrect thread selection in wakeup broadcast In commit `cfdd20a0b` ("MEDIUM: fd: support broadcasting updates for foreign groups in updt_fd_polling") we decided to pick a random thread number among a set of candidates for a wakeup in case we need an instant change. But the thread count range was wrong (MAX_THREADS) instead of tg->count, resulting in random crashes when thread groups are > 1 and MAX_THREADS > 64. No backport is needed, this was introduced in 2.7-dev2.	2022-07-19 16:01:04 +02:00
Willy Tarreau	cfdd20a0b2	MEDIUM: fd: support broadcasting updates for foreign groups in updt_fd_polling We're still facing the situation where it's impossible to update an FD for a foreign group. That's of particular concern when disabling/enabling listeners (e.g. pause/resume on signals) since we don't decide which thread gets the signal and it needs to process all listeners at once. Fortunately, not that much is unprotected in FDs. This patch adds a test for tgid's equality in updt_fd_polling() so that if a change is applied for a foreing group, then it's detected and taken care of separately. The method consists in forcing the update on all bound threads in this group, adding it to the group's update_list, and sending a wake-up as would be done for a remote thread in the local group, except that this is done by grabbing a reference to the FD's tgid. Thanks to this, SIGTTOU/SIGTTIN now work for nbtgroups > 1 (after that was temporarily broken by "MEDIUM: fd/poller: make the update-list per-group").	2022-07-15 20:25:41 +02:00
Willy Tarreau	9baff4ffd9	MEDIUM: fd: support stopping FDs during starting There's a nasty case during boot, which is the master process. It stops all listeners from the main thread, and as such we're seeing calls to fd_delete() from a thread that doesn't match the FD's mask, but more importantly from a group that doesn't match either. Fortunately this happens in a process that doesn't see the threads creation, so the FDs are left intact in the table and we can overwrite the tgid there. The approach is ugly, it probably shows that we should use a dummy value for the tgid during boot, that would be replaced once the FDs migrate to their target, but we also need a way to make sure not to miss them. Also that doesn't solve the possibility of closing a listener at run time from the wrong thread group.	2022-07-15 20:16:30 +02:00
Willy Tarreau	88c4c14050	MINOR: fd: add fd_reregister_all() to deal with boot-time FDs At boot the pollers are allocated for each thread and they need to reprogram updates for all FDs they will manage. This code is not trivial, especially when trying to respect thread groups, so we'd rather avoid duplicating it. Let's centralize this into fd.c with this function. It avoids closed FDs, those whose thread mask doesn't match the requested one or whose thread group doesn't match the requested one, and performs the update if required under thread-group protection.	2022-07-15 20:16:30 +02:00
Willy Tarreau	0b51eab764	MEDIUM: fd: quit fd_update_events() when FD is closed The IOCB might have closed the FD itself, so it's not an error to have fd.tgid==0 or anything else, nor to have a null running_mask. In fact there are different conditions under which we can leave the IOCB, all of them have been enumerated in the code's comments (namely FD still valid and used, hence has running bit, FD closed but not yet reassigned thus running==0, FD closed and reassigned, hence different tgid and running becomes irrelevant, just like all other masks). For this reason we have no other solution but to try to grab the tgid on return before checking the other bits. In practice it doesn't represent a big cost, because if the FD was closed and reassigned, it's instantly detected and the bit is immediately released without blocking other threads, and if the FD wasn't closed this doesn't prevent it from being migrated to another thread. In the worst case a close by another thread after a migration will be postponed till the moment the running bit is cleared, which is the same as before.	2022-07-15 20:16:30 +02:00
Willy Tarreau	ddedc16624	MEDIUM: fd: make fd_insert/fd_delete atomically update fd.tgid These functions need to set/reset the FD's tgid but when they're called there may still be wakeups on other threads that discover late updates and have to touch the tgid at the same time. As such, it is not possible to just read/write the tgid there. It must only be done using operations that are compatible with what other threads may be doing. As we're using inc/dec on the refcount, it's safe to AND the area to zero the lower part when resetting the value. However, in order to set the value, there's no other choice but fd_claim_tgid() which will assign it only if possible (via a CAS). This is convenient in the end because it protects the FD's masks from being modified by late threads, so while we hold this refcount we can safely reset the thread_mask and a few other elements. A debug test for non-null masks was added to fd_insert() as it must not be possible to face this situation thanks to the protection offered by the tgid.	2022-07-15 20:16:30 +02:00
Willy Tarreau	27a3245599	MEDIUM: fd: make fd_insert() take local thread masks fd_insert() was already given a thread group ID and a global thread mask. Now we're changing the few callers to take the group-local thread mask instead. It's passed directly into the FD's thread mask. Just like for previous commit, it must not change anything when a single group is configured.	2022-07-15 20:16:30 +02:00
Willy Tarreau	3638d174e5	MEDIUM: fd: make thread_mask now represent group-local IDs With the change that was started on other masks, the thread mask was still not fully converted, sometimes being used as a global mask and sometimes as a local one. This finishes the code modifications so that the mask is always considered as a group-local mask. This doesn't change anything as long as there's a single group, but is necessary for groups 2 and above since it's used against running_mask and so on.	2022-07-15 20:16:30 +02:00
Willy Tarreau	d6e1987612	MINOR: fd: make fd_clr_running() return the previous value instead It's an AND so it destroys information and due to this there's a call place where we have to perform two reads to know the previous value then to change it. With a fetch-and-and instead, in a single operation we can know if the bit was previously present, which is more efficient.	2022-07-15 20:16:30 +02:00
Willy Tarreau	a707d02657	MEDIUM: fd/poller: turn running_mask to group-local IDs From now on, the FD's running_mask only refers to local thread IDs. However, there remains a limitation, in updt_fd_polling(), we temporarily have to check and set shared FDs against .thread_mask, which still contains global ones. As such, nbtgroups > 1 may break (but this is not yet supported without special build options).	2022-07-15 20:16:30 +02:00
Willy Tarreau	6d3c501c08	MEDIUM: fd/poller: turn update_mask to group-local IDs From now on, the FD's update_mask only refers to local thread IDs. However, there remains a limitation, in updt_fd_polling(), we temporarily have to check and set shared FDs against .thread_mask, which still contains global ones. As such, nbtgroups > 1 may break (but this is not yet supported without special build options).	2022-07-15 20:16:30 +02:00
Willy Tarreau	0dc1cc93b6	MAJOR: fd: grab the tgid before manipulating running We now grab a reference to the FD's tgid before manipulating the running_mask so that we're certain it corresponds to our own group (hence bits), and we drop it once we've set the bit. For now there's no measurable performance impact in doing this, which is great. The lock can be observed by perf top as taking a small share of the time spent in fd_update_events(), itself taking no more than 0.28% of CPU under 8 threads. However due to the fact that the thread groups are not yet properly spread across the pollers and the thread masks are still wrong, this will trigger some BUG_ON() in fd_insert() after a few tens of thousands of connections when threads other than those of group 1 are reached, and this is expected.	2022-07-15 20:16:30 +02:00
Willy Tarreau	9464bb1f05	MEDIUM: fd: add the tgid to the fd and pass it to fd_insert() The file descriptors will need to know the thread group ID in addition to the mask. This extends fd_insert() to take the tgid, and will store it into the FD. In the FD, the tgid is stored as a combination of tgid on the lower 16 bits and a refcount on the higher 16 bits. This allows to know when it's really possible to trust the tgid and the running mask. If a refcount is higher than 1 it indeed indicates another thread else might be in the process of updating these values. Since a closed FD must necessarily have a zero refcount, a test was added to fd_insert() to make sure that it is the case.	2022-07-15 19:58:06 +02:00
Willy Tarreau	8e2c0fa8e5	MINOR: fd: delete unused updates on close() After a poller's ->clo() was called to completely terminate operations on an FD, there's no reason for keeping updates on this FD, so if any updates were already programmed it would be nice if we could delete them. Tests show that __fd_clo() is called roughly half of the time with the last FD from the local update list, which possibly makes sense if a close has to appear after a polling change resulting from an incomplete read or the end of a send(). We can detect this and remove the last entry, which gives less work to do during the update() call, and eliminates most of the poll_drop_fd event reports. Note that while tempting, this must not be backported because it's only safe to be done now that fd_delete_orphan() clears the update mask as we need to be certain not to miss it: - if the update mask is kept up with no entry, we can miss future updates ; - if the update mask is cleared too fast, it may result in failure to add a shared event.	2022-07-15 19:58:06 +02:00
Willy Tarreau	35ee710ece	MEDIUM: fd/poller: make the update-list per-group The update-list needs to be per-group because its inspection is based on a mask and we need to be certain when scanning it if a mask is for the same thread or another one. Once per-group there's no doubt about it, even if the FD's polling changes, the entry remains valid. It will be needed to check the tgid though. Note that a soft-stop or pause/resume might not necessarily work here with tgroups>1, because the operation might be delivered to a thread that doesn't belong to the group and whoe update mask will not reflect one that is interesting here. We can't do better at this stage.	2022-07-15 19:57:28 +02:00
Willy Tarreau	2f36d902aa	MAJOR: fd: remove pending updates upon real close Dealing with long-lasting updates that outlive a close() is always going to be quite a problem, not because of the thread that will discover such updates late, but mostly due to the shared update_list that will have an entry on hold making it difficult to reuse it, and requiring that the fd's tgid is changed and the update_mask reset from a safe location. After careful inspection, it turns out that all our pollers that support automatic event removal upon close() do not need any extra bookkeeping, and that poll and select that use an internal representation already provide a poller->clo() callback that is already used to update the local event. As such, it is already safe to reset the update mask and to remove the event from the shared list just before the final close, because nothing remains to be done with this FD by the poller. Doing so considerably simplifies the handling of updates, which will only have to be inspected by the pollers, while the writers can continue to consider that the entries are always valid. Another benefit is that it will be possible to reduce contention on the update_list by just having one update_list per group (left to be done later if needed).	2022-07-15 19:43:10 +02:00
Willy Tarreau	b1093c6ba2	MEDIUM: poller: program the update in fd_update_events() for a migrated FD When an FD is migrated, all pollers program an update. That's useless code duplication, and when thread groups will be supported, this will require an extra round of locking just to verify the update_mask on return. Let's just program the update direction from fd_update_events() as it already does for closed FDs, this becomes more logical.	2022-07-15 19:43:10 +02:00
Willy Tarreau	4d9888ca69	CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros They were initially made to deal with both the cache and the update list but there's no cache anymore and keeping them for the update list adds a lot of obfuscation that is really not desired. Let's get rid of them now. Their purpose was simply to get a pointer to fdtab[fd].update.{,next,prev} in order to perform atomic tests and modifications. The offset passed in argument to the functions (fd_add_to_fd_list() and fd_rm_from_fd_list()) was the offset of the ->update field in fdtab, and as it's not used anymore it was removed. This also removes a number of casts, though those used by the atomic ops have to remain since only scalars are supported.	2022-07-15 19:41:26 +02:00
Willy Tarreau	e7475c8e79	MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag Every single place where sleeping_thread_mask was still used was to test or set a single thread. We can now add a per-thread flag to indicate a thread is sleeping, and remove this shared mask. The wake_thread() function now always performs an atomic fetch-and-or instead of a first load then an atomic OR. That's cleaner and more reliable. This is not easy to test, as broadcast FD events are rare. The good way to test for this is to run a very low rate-limited frontend with a listener that listens to the fewest possible threads (2), and to send it only 1 connection at a time. The listener will periodically pause and the wakeup task will sometimes wake up on a random thread and will call wake_thread(): frontend test bind :8888 maxconn 10 thread 1-2 rate-limit sessions 5 Alternately, disabling/enabling a frontend in loops via the CLI also broadcasts such events, but they're more difficult to observe since this is causing connection failures.	2022-07-01 19:15:14 +02:00
Willy Tarreau	555c192d14	MINOR: poller: update_fd_polling: wake a random other thread When enabling an FD that's only bound to another thread, instead of always picking the first one, let's pick a random one. This is rarely used (enabling a frontend, or session rate-limiting period ending), and has greater chances of avoiding that some obscure corner cases could degenerate into a poorly distributed load.	2022-07-01 19:15:14 +02:00
Willy Tarreau	962e5ba72b	MEDIUM: polling: make update_fd_polling() not care about sleeping threads Till now, update_fd_polling() used to check if all the target threads were sleeping, and only then would wake an owning thread up. This causes several problems among which the need for the sleeping_thread_mask and the fact that by the time we wake one thread up, it has changed. This commit changes this by leaving it to wake_thread() to perform this check on the selected thread, since wake_thread() is already capable of doing this now. Concretely speaking, for updt_fd_polling() it will mean performing one computation of an ffsl() before knowing the sleeping status on a global FD state change (which is very rare and not important here, as it basically happens after relaxing a rate-limit (i.e. once a second at beast) or after enabling a frontend from the CLI); thus we don't care.	2022-07-01 19:15:14 +02:00
Willy Tarreau	058b2c1015	MINOR: poller: centralize poll return handling When returning from the polling syscall, all pollers have a certain dance to follow, made of wall clock updates, thread harmless updates, idle time management and sleeping mask updates. Let's have a centralized function to deal with all of this boring stuff: fd_leaving_poll(), and make all the pollers use it.	2022-07-01 19:15:14 +02:00
Willy Tarreau	bdcd32598f	MINOR: thread: only use atomic ops to touch the flags The thread flags are touched a little bit by other threads, e.g. the STUCK flag may be set by other ones, and they're watched a little bit. As such we need to use atomic ops only to manipulate them. Most places were already using them, but here we generalize the practice. Only ha_thread_dump() does not change because it's run under isolation.	2022-07-01 19:15:14 +02:00
Willy Tarreau	f3efef4d60	MINOR: thread: make wake_thread() take care of the sleeping threads mask Almost every call place of wake_thread() checks for sleeping threads and clears the sleeping mask itself, while the function is solely used for there. Let's move the check and the clearing of the bit inside the function itself. Note that updt_fd_polling() still performs the check because its rules are a bit different.	2022-07-01 19:15:14 +02:00
Emeric Brun	f41a3f6762	MINOR: fd: add a new FD_DISOWN flag to prevent from closing a deleted FD Some FDs might be offered to some external code (external libraries) which will deal with them until they close them. As such we must not close them upon fd_delete() but we need to delete them anyway so that they do not appear anymore in the fdtab. This used to be handled by fd_remove() before 2.3 but we don't have this anymore. This patch introduces a new flag FD_DISOWN to let fd_delete() know that the core doesn't own the fd and it must not be closed upon removal from the fd_tab. This way it's totally unregistered from the poller but still open. This patch must be backported on branches >= 2.3 because it will be needed to fix a bug affecting SSL async. it should be adapted on 2.3 because state flags were stored in a different way (via bits in the structure).	2022-07-01 17:41:40 +02:00
Willy Tarreau	974358954b	BUILD: fd: disguise the fd_set_nonblock/cloexec result We thought that we could get rid of some DISGUISE() with commit `a80e4a354` ("MINOR: fd: add functions to set O_NONBLOCK and FD_CLOEXEC") thanks to the calls being in a function but that was without counting on Coverity. Let's put it directly in the function since most if not all callers don't care about this result.	2022-04-27 10:52:21 +02:00
Willy Tarreau	382474348c	CLEANUP: tree-wide: use fd_set_nonblock() and fd_set_cloexec() This gets rid of most open-coded fcntl() calls, some of which were passed through DISGUISE() to avoid a useless test. The FD_CLOEXEC was most often set without preserving previous flags, which could become a problem once new flags are created. Now this will not happen anymore.	2022-04-26 10:59:48 +02:00
Willy Tarreau	a80e4a3546	MINOR: fd: add functions to set O_NONBLOCK and FD_CLOEXEC Instead of seeing each location manipulate the fcntl() themselves and often forget to check previous flags, let's centralize the functions to do this. It also allows to drop fcntl.h from most call places and will ease the adoption of different OS-specific mechanisms if needed. Note that the fd_set_nonblock() function purposely doesn't check the previous flags as it's meant to be used on new FDs only.	2022-04-26 10:59:48 +02:00
Willy Tarreau	acef5e27b0	MINOR: tree-wide: always consider EWOULDBLOCK in addition to EAGAIN Some older systems may routinely return EWOULDBLOCK for some syscalls while we tend to check only for EAGAIN nowadays. Modern systems define EWOULDBLOCK as EAGAIN so that solves it, but on a few older ones (AIX, VMS etc) both are different, and for portability we'd need to test for both or we never know if we risk to confuse some status codes with plain errors. There were few entries, the most annoying ones are the switch/case because they require to only add the entry when it differs, but the other ones are really trivial.	2022-04-25 20:32:15 +02:00
Willy Tarreau	9aa324de2d	DEBUG: fd: make sure we never try to insert/delete an impossible FD number It's among the cases that would provoke memory corruption, let's add some tests against negative FDs and those larger than the table. This must never ever happen and would currently result in silent corruption or a crash. Better have a noticeable one exhibiting the call chain if that were to happen.	2022-01-31 21:00:35 +01:00
Willy Tarreau	97ea9c49f1	BUG/MEDIUM: fd: always align fdtab[] to 64 bytes There's a risk that fdtab is not 64-byte aligned. The first effect is that it may cause false sharing between cache lines resulting in contention when adjacent FDs are used by different threads. The second is related to what is explained in commit "BUG/MAJOR: compiler: relax alignment constraints on certain structures", i.e. that modern compilers might make use of aligned vector operations to zero some entries, and would crash. We do not use any memset() or so on fdtab, so the risk is almost inexistent, but that's not a reason for violating some valid assumptions. This patch addresses this by allocating 64 extra bytes and aligning the structure manually (this is an extremely cheap solution for this specific case). The original address is stored in a new variable "fdtab_addr" and is the one that gets freed. This remains extremely simple and should be easily backportable. A dedicated aligned allocator later would help, of course. This needs to be backported as far as 2.2. No issue related to this was reported yet, but it could very well happen as compilers evolve. In addition this should preserve high performance across restarts (i.e. no more dependency on allocator's alignment).	2022-01-27 16:28:10 +01:00
Willy Tarreau	3a6af1e5e8	MINOR: fd: register the write side of the poller pipe as well The poller's pipe was only registered on the read side since we don't need to poll to write on it. But this leaves some known FDs so it's better to also register the write side with no event. This will allow to show them in "show fd" and to avoid dumping them as unhandled FDs. Note that the only other type of unhandled FDs left are: - stdin/stdout/stderr - epoll FDs The later can be registered upon startup though but at least a dummy handler would be needed to keep the fdtab clean.	2022-01-24 20:41:25 +01:00
Willy Tarreau	a0b99536c8	REORG: thread/sched: move the thread_info flags to the thread_ctx The TI_FL_STUCK flag is manipulated by the watchdog and scheduler and describes the apparent life/death of a thread so it changes all the time and it makes sense to move it to the thread's context for an active thread.	2021-10-08 17:22:26 +02:00
Willy Tarreau	b63888c67c	REORG: fd: uninline compute_poll_timeout() It's not needed to inline it at all (one call per loop) and it introduces dependencies, let's move it to fd.c. Removing the few remaining includes that came with it further reduced by ~0.2% the LoC and the build time is now below 6s.	2021-10-07 01:41:14 +02:00
Willy Tarreau	5d9ddc5442	BUILD: tree-wide: add several missing activity.h A number of files currently access activity counters but rely on their definitions to be inherited from other files (task.c, backend.c hlua.c, sock.c, pool.c, stats.c, fd.c).	2021-10-07 01:36:51 +02:00
Willy Tarreau	87063a7da1	BUILD: fd: remove unused variable totlen in fd_write_frag_line() Ilya reports in GH #1392 that clang 13 complains about totlen being calculated and not used in fd_write_frag_line(), which is true. It's a leftover of some older code.	2021-09-17 12:00:27 +02:00

1 2 3 4

199 Commits