haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-13 18:46:57 +02:00

Author	SHA1	Message	Date
Willy Tarreau	b2f38c13d1	BUG/MINOR: thread: always reload threads_enabled in loops A few loops waiting for threads to synchronize such as thread_isolate() rightfully filter the thread masks via the threads_enabled field that contains the list of enabled threads. However, it doesn't use an atomic load on it. Before 2.7, the equivalent variables were marked as volatile and were always reloaded. In 2.7 they're fields in ha_tgroup_ctx[], and the risk that the compiler keeps them in a register inside a loop is not null at all. In practice when ha_thread_relax() calls sched_yield() or an x86 PAUSE instruction, it could be verified that the variable is always reloaded. If these are avoided (e.g. architecture providing neither solution), it's visible in asm code that the variables are not reloaded. In this case, if a thread exists just between the moment the two values are read, the loop could spin forever. This patch adds the required _HA_ATOMIC_LOAD() on the relevant threads_enabled fields. It must be backported to 2.7.	2023-01-19 19:22:17 +01:00
Willy Tarreau	922a907926	MINOR: fd: add a new function to only raise RLIMIT_NOFILE In issue #1866 an issue was reported under docker, by which a user cannot lower the number of FD needed. It looks like a restriction imposed in this environment, but it results in an error while it ought not have to in the case of shrinking. This patch adds a new function raise_rlim_nofile() that takes the desired new setting, compares it to the current one, and only calls setrlimit() if one of the values in the new setting is larger than the older one. As such it will continue to emit warnings and errors in case of failure to raise the limit but will never shrink it. This patch is only preliminary to another one, but will have to be backported where relevant (likely only 2.6).	2022-10-04 08:38:47 +02:00
William Lallemand	dc66f2f97d	DEBUG: fd: split the fd check Split the BUG_ON(fd < 0 \|\| fd >= global.maxsock) so it's easier to know if it quits because of a -1.	2022-07-26 10:35:24 +02:00
Willy Tarreau	51b1fcedeb	DEBUG: fd: detect possibly invalid tgid in fd_insert() Since the API is still a bit young, let's make sure nobody tries to assign and FD to a group not strictly 1..MAX_TGROUPS as that would indicate a bug. Note: some of these might be relaxed to BUG_ON_HOT() in the future	2022-07-25 15:47:45 +02:00
Christopher Faulet	7e94b40a22	BUG/MINOR: fd: Properly init the fd state in fd_insert() When a new fd is inserted in the fdtab array, its state is initialized. The "newstate" variable is used to compute the right state (0 by default, but FD_ET_POSSIBLE flag is set if edge-triggered is supported for the fd). However, this variable is never used and the fd state is always set to 0. Now, the fd state is initialized with "newstate" variable. This bug was introduced by commit `ddedc1662` ("MEDIUM: fd: make fd_insert/fd_delete atomically update fd.tgid"). No backport needed.	2022-07-19 12:11:04 +02:00
Willy Tarreau	88c4c14050	MINOR: fd: add fd_reregister_all() to deal with boot-time FDs At boot the pollers are allocated for each thread and they need to reprogram updates for all FDs they will manage. This code is not trivial, especially when trying to respect thread groups, so we'd rather avoid duplicating it. Let's centralize this into fd.c with this function. It avoids closed FDs, those whose thread mask doesn't match the requested one or whose thread group doesn't match the requested one, and performs the update if required under thread-group protection.	2022-07-15 20:16:30 +02:00
Willy Tarreau	ddedc16624	MEDIUM: fd: make fd_insert/fd_delete atomically update fd.tgid These functions need to set/reset the FD's tgid but when they're called there may still be wakeups on other threads that discover late updates and have to touch the tgid at the same time. As such, it is not possible to just read/write the tgid there. It must only be done using operations that are compatible with what other threads may be doing. As we're using inc/dec on the refcount, it's safe to AND the area to zero the lower part when resetting the value. However, in order to set the value, there's no other choice but fd_claim_tgid() which will assign it only if possible (via a CAS). This is convenient in the end because it protects the FD's masks from being modified by late threads, so while we hold this refcount we can safely reset the thread_mask and a few other elements. A debug test for non-null masks was added to fd_insert() as it must not be possible to face this situation thanks to the protection offered by the tgid.	2022-07-15 20:16:30 +02:00
Willy Tarreau	3638d174e5	MEDIUM: fd: make thread_mask now represent group-local IDs With the change that was started on other masks, the thread mask was still not fully converted, sometimes being used as a global mask and sometimes as a local one. This finishes the code modifications so that the mask is always considered as a group-local mask. This doesn't change anything as long as there's a single group, but is necessary for groups 2 and above since it's used against running_mask and so on.	2022-07-15 20:16:30 +02:00
Willy Tarreau	d6e1987612	MINOR: fd: make fd_clr_running() return the previous value instead It's an AND so it destroys information and due to this there's a call place where we have to perform two reads to know the previous value then to change it. With a fetch-and-and instead, in a single operation we can know if the bit was previously present, which is more efficient.	2022-07-15 20:16:30 +02:00
Willy Tarreau	a707d02657	MEDIUM: fd/poller: turn running_mask to group-local IDs From now on, the FD's running_mask only refers to local thread IDs. However, there remains a limitation, in updt_fd_polling(), we temporarily have to check and set shared FDs against .thread_mask, which still contains global ones. As such, nbtgroups > 1 may break (but this is not yet supported without special build options).	2022-07-15 20:16:30 +02:00
Willy Tarreau	6d3c501c08	MEDIUM: fd/poller: turn update_mask to group-local IDs From now on, the FD's update_mask only refers to local thread IDs. However, there remains a limitation, in updt_fd_polling(), we temporarily have to check and set shared FDs against .thread_mask, which still contains global ones. As such, nbtgroups > 1 may break (but this is not yet supported without special build options).	2022-07-15 20:16:30 +02:00
Willy Tarreau	ceffd17f52	MINOR: fd: add fd_get_running() to atomically return the running mask The running mask is only valid if the tgid is the expected one. This function takes a reference on the tgid before reading the running mask, so that both are checked at once. It returns either the mask or zero if the tgid differs, thus providing a simple way for a caller to check if it still holds the FD.	2022-07-15 20:16:30 +02:00
Willy Tarreau	080373ea38	MINOR: fd: add functions to manipulate the FD's tgid The FD's tgid is refcounted and must be atomically manipulated. Function fd_grab_tgid() will increase the refcount but only if the tgid matches the one in argument (likely the current one). fd_claim_tgid() will be used to self-assign the tgid after waiting for its refcount to reach zero. fd_drop_tgid() will be used to drop a temporarily held tgid. All of these are needed to prevent an FD from being reassigned to another group, either when inspecting/modifying the running_mask, or when checking for updates, in order to be certain that the mask being seen corresponds to the desired group. Note that once at least one bit is set in the running mask of an active FD, it cannot be closed, thus not migrated, thus the reference does not need to be held long.	2022-07-15 20:16:09 +02:00
Willy Tarreau	9464bb1f05	MEDIUM: fd: add the tgid to the fd and pass it to fd_insert() The file descriptors will need to know the thread group ID in addition to the mask. This extends fd_insert() to take the tgid, and will store it into the FD. In the FD, the tgid is stored as a combination of tgid on the lower 16 bits and a refcount on the higher 16 bits. This allows to know when it's really possible to trust the tgid and the running mask. If a refcount is higher than 1 it indeed indicates another thread else might be in the process of updating these values. Since a closed FD must necessarily have a zero refcount, a test was added to fd_insert() to make sure that it is the case.	2022-07-15 19:58:06 +02:00
Willy Tarreau	512dd2dc1c	MINOR: fd: make fd_insert() apply the thread mask itself It's a bit ugly to see that half of the callers of fd_insert() have to apply all_threads_mask themselves to the bit field they're passing, because usually it comes from a listener that may have other bits set. Let's make the function apply the mask itself.	2022-07-15 19:58:06 +02:00
Willy Tarreau	35ee710ece	MEDIUM: fd/poller: make the update-list per-group The update-list needs to be per-group because its inspection is based on a mask and we need to be certain when scanning it if a mask is for the same thread or another one. Once per-group there's no doubt about it, even if the FD's polling changes, the entry remains valid. It will be needed to check the tgid though. Note that a soft-stop or pause/resume might not necessarily work here with tgroups>1, because the operation might be delivered to a thread that doesn't belong to the group and whoe update mask will not reflect one that is interesting here. We can't do better at this stage.	2022-07-15 19:57:28 +02:00
Willy Tarreau	82e378aa8a	MINOR: fd/thread: get rid of thread_mask() Since commit `d2494e048` ("BUG/MEDIUM: peers/config: properly set the thread mask") there must not remain any single case of a receiver that is bound nowhere, so there's no need anymore for thread_mask(). We're adding a test in fd_insert() to make sure this doesn't happen by accident though, but the function was removed and its rare uses were replaced with the original value of the bind_thread msak.	2022-07-15 19:43:10 +02:00
Willy Tarreau	4d9888ca69	CLEANUP: fd: get rid of the __GET_{NEXT,PREV} macros They were initially made to deal with both the cache and the update list but there's no cache anymore and keeping them for the update list adds a lot of obfuscation that is really not desired. Let's get rid of them now. Their purpose was simply to get a pointer to fdtab[fd].update.{,next,prev} in order to perform atomic tests and modifications. The offset passed in argument to the functions (fd_add_to_fd_list() and fd_rm_from_fd_list()) was the offset of the ->update field in fdtab, and as it's not used anymore it was removed. This also removes a number of casts, though those used by the atomic ops have to remain since only scalars are supported.	2022-07-15 19:41:26 +02:00
Emeric Brun	36d9097cf3	MINOR: fd: Add BUG_ON checks on fd_insert() This patch adds two BUG_ON on fd_insert() into the fdtab checking if the fd has been correctly re-initialized into the fdtab before a new insert. It will raise a BUG if we try to insert the same fd multiple times without an intermediate fd_delete(). First one checks that the owner for this fd in fdtab was reset to NULL. Second one checks that the state flags for this fd in fdtab was reset to 0. This patch could be backported on version >= 2.4	2022-07-05 05:18:51 +02:00
Willy Tarreau	e7475c8e79	MEDIUM: tasks/fd: replace sleeping_thread_mask with a TH_FL_SLEEPING flag Every single place where sleeping_thread_mask was still used was to test or set a single thread. We can now add a per-thread flag to indicate a thread is sleeping, and remove this shared mask. The wake_thread() function now always performs an atomic fetch-and-or instead of a first load then an atomic OR. That's cleaner and more reliable. This is not easy to test, as broadcast FD events are rare. The good way to test for this is to run a very low rate-limited frontend with a listener that listens to the fewest possible threads (2), and to send it only 1 connection at a time. The listener will periodically pause and the wakeup task will sometimes wake up on a random thread and will call wake_thread(): frontend test bind :8888 maxconn 10 thread 1-2 rate-limit sessions 5 Alternately, disabling/enabling a frontend in loops via the CLI also broadcasts such events, but they're more difficult to observe since this is causing connection failures.	2022-07-01 19:15:14 +02:00
Willy Tarreau	dce4ad755f	MEDIUM: thread: add a new per-thread flag TH_FL_NOTIFIED to remember wakeups Right now when an inter-thread wakeup happens, we preliminary check if the thread was asleep, and if so we wake the poller up and remove its bit from the sleeping mask. That's not very clean since the sleeping mask cannot be entirely trusted since a thread that's about to wake up will already have its sleeping bit removed. This patch adds a new per-thread flag (TH_FL_NOTIFIED) to remember that a thread was notified to wake up. It's cleared before checking the task lists last, so that new wakeups can be considered again (since wake_thread() is only used to notify about task wakeups and FD polling changes). This way we do not need to modify a remote thread's sleeping mask anymore. As such wake_thread() now only tests and sets the TH_FL_NOTIFIED flag but doesn't clear sleeping anymore.	2022-07-01 19:15:14 +02:00
Willy Tarreau	058b2c1015	MINOR: poller: centralize poll return handling When returning from the polling syscall, all pollers have a certain dance to follow, made of wall clock updates, thread harmless updates, idle time management and sleeping mask updates. Let's have a centralized function to deal with all of this boring stuff: fd_leaving_poll(), and make all the pollers use it.	2022-07-01 19:15:14 +02:00
Willy Tarreau	f3efef4d60	MINOR: thread: make wake_thread() take care of the sleeping threads mask Almost every call place of wake_thread() checks for sleeping threads and clears the sleeping mask itself, while the function is solely used for there. Let's move the check and the clearing of the bit inside the function itself. Note that updt_fd_polling() still performs the check because its rules are a bit different.	2022-07-01 19:15:14 +02:00
Willy Tarreau	a80e4a3546	MINOR: fd: add functions to set O_NONBLOCK and FD_CLOEXEC Instead of seeing each location manipulate the fcntl() themselves and often forget to check previous flags, let's centralize the functions to do this. It also allows to drop fcntl.h from most call places and will ease the adoption of different OS-specific mechanisms if needed. Note that the fd_set_nonblock() function purposely doesn't check the previous flags as it's meant to be used on new FDs only.	2022-04-26 10:59:48 +02:00
Willy Tarreau	9aa324de2d	DEBUG: fd: make sure we never try to insert/delete an impossible FD number It's among the cases that would provoke memory corruption, let's add some tests against negative FDs and those larger than the table. This must never ever happen and would currently result in silent corruption or a crash. Better have a noticeable one exhibiting the call chain if that were to happen.	2022-01-31 21:00:35 +01:00
Willy Tarreau	b63888c67c	REORG: fd: uninline compute_poll_timeout() It's not needed to inline it at all (one call per loop) and it introduces dependencies, let's move it to fd.c. Removing the few remaining includes that came with it further reduced by ~0.2% the LoC and the build time is now below 6s.	2021-10-07 01:41:14 +02:00
Willy Tarreau	c91f608bcb	CLEANUP: fd: do not include time.h It's not needed at all here.	2021-10-07 01:41:14 +02:00
Tim Duesterhus	992007ec78	CLEANUP: tree-wide: fix prototypes for functions taking no arguments. "f(void)" is the correct and preferred form for a function taking no argument, while some places use the older "f()". These were reported by clang's -Wmissing-prototypes, for example: src/cpuset.c:111:5: warning: no previous prototype for function 'ha_cpuset_size' [-Wmissing-prototypes] int ha_cpuset_size() include/haproxy/cpuset.h:42:5: note: this declaration is not a prototype; add 'void' to make it a prototype for a zero-parameter function int ha_cpuset_size(); ^ void This aggregate patch fixes this for the following functions: ha_backtrace_to_stderr(), ha_cpuset_size(), ha_panic(), ha_random64(), ha_thread_dump_all_to_trash(), get_exec_path(), check_config_validity(), mworker_child_nb(), mworker_cli_proxy_(create\|stop)(), mworker_cleantasks(), mworker_cleanlisteners(), mworker_ext_launch_all(), mworker_reload(), mworker_(env\|proc_list)_to_(proc_list\|env)(), mworker_(un\|)block_signals(), proxy_adjust_all_maxconn(), proxy_destroy_all_defaults(), get_tainted(), pool_total_(allocated\|used)(), thread_isolate(_full\|)(), thread(_sync\|)_release(), thread_harmless_till_end(), thread_cpu_mask_forced(), dequeue_all_listeners(), next_timer_expiry(), wake_expired_tasks(), process_runnable_tasks(), init_acl(), init_buffer(), (de\|)init_log_buffers(), (de\|)init_pollers(), fork_poller(), pool_destroy_all(), pool_evict_from_local_caches(), pool_total_failures(), dump_pools_to_trash(), cfg_run_diagnostics(), tv_init_(process\|thread)_date(), __signal_process_queue(), deinit_signals(), haproxy_unblock_signals()	2021-09-15 11:07:18 +02:00
Willy Tarreau	7b2ac29a92	CLEANUP: fd: remove the now unneeded fd_mig_lock This is not needed anymore since we don't use it when setting the running mask anymore.	2021-08-04 16:03:36 +02:00
Willy Tarreau	b201b1dab1	CLEANUP: fd: remove the now unused fd_set_running() It was inlined inside fd_update_events() since it relies on a loop that may return immediate failure codes.	2021-08-04 16:03:36 +02:00
Willy Tarreau	200bd50b73	MEDIUM: fd: rely more on fd_update_events() to detect changes This function already performs a number of checks prior to calling the IOCB, and detects the change of thread (FD migration). Half of the controls are still in each poller, and these pollers also maintain activity counters for various cases. Note that the unreliable test on thread_mask was removed so that only the one performed by fd_set_running() is now used, since this one is reliable. Let's centralize all that fd-specific logic into the function and make it return a status among: FD_UPDT_DONE, // update done, nothing else to be done FD_UPDT_DEAD, // FD was already dead, ignore it FD_UPDT_CLOSED, // FD was closed FD_UPDT_MIGRATED, // FD was migrated, ignore it now Some pollers already used to call it last and have nothing to do after it, regardless of the result. epoll has to delete the FD in case a migration is detected. Overall this removes more code than it adds.	2021-07-30 17:45:18 +02:00
Willy Tarreau	84c7922c52	REORG: fd: uninline fd_update_events() This function has become a monster (80 lines and 2/3 of a kB), it doesn't benefit from being static nor inline anymore, let's move it to fd.c.	2021-07-30 17:41:55 +02:00
Willy Tarreau	a199a17d72	MINOR: fd: update flags only once in fd_update_events() Since 2.4 with commit `f50906519` ("MEDIUM: fd: merge fdtab[].ev and state for FD_EV_* and FD_POLL_* into state") we can merge all flag updates at once in fd_update_events(). Previously this was performed in 1 to 3 steps, setting the polling state, then setting READY_R if in/err/hup, and setting READY_W if out/err. But since the commit above, all flags are stored together in the same structure field that is being updated with the new flags, thus we can simply update the flags altogether and avoid multiple atomic operations. This even removes the need for atomic ops for FDs that are not shared.	2021-07-30 17:41:55 +02:00
Willy Tarreau	d5402b8df8	BUG/MINOR: fd: protect fd state harder against a concurrent takeover There's a theoretical race (that we failed to trigger) in function fd_update_events(), which could strike on idle connections. The "locked" variable will most often be 0 as the FD is bound to the current thread only. Another thread could take it over once "locked" is set, change the thread and running masks. Then the first thread updates the FD's state non-atomically and possibly overwrites what the other thread was preparing. It still looks like the FD's state will ultimately converge though. The solution against this is to set the running flag earlier so that a takeover() attempt cannot succeed, or that the fd_set_running() attempt fails, indicating that nothing needs to be done on this FD. While this is sufficient for a simple fix to be backported, it leaves the FD actively polled in the calling thread, this will trigger a second wakeup which will notice the absence of tid_bit in the thread_mask, getting rid of it. A more elaborate solution would consist in calling fd_set_running() directly from the pollers before calling fd_update_events(), getting rid of the thread_mask test and letting the caller eliminate that FD from its list if needed. Interestingly, this code also proves to be suboptimal in that it sets the FD state twice instead of calculating the new state at once and always using a CAS to set it. This is a leftover of a simplification that went into 2.4 and which should be explored in a future patch. This may be backported as far as 2.2.	2021-07-30 14:54:19 +02:00
Willy Tarreau	1197459e0a	BUG/MAJOR: fd: switch temp values to uint in fd_stop_both() With latest commit `f50906519` ("MEDIUM: fd: merge fdtab[].ev and state for FD_EV_* and FD_POLL_* into state") one occurrence of a pair of chars was missed in fd_stop_both(), resulting in the operation to fail if the upper flags were set. Interestingly it managed to fail 2 tests in all setups in the CI while all used to work fine on my local machines. Probably that the reason is that the chars had enough room above them for the CAS to fail then refill "old" overwriting the upper parts of the stack, and that thanks to this the subsequent tests worked. With ASAN being used on lots of tests, it very likely caught it but used to only report failed tests with no more info. No backport is needed, as this was never released nor backported.	2021-04-07 20:46:26 +02:00
Willy Tarreau	4781b1521a	CLEANUP: atomic/tree-wide: replace single increments/decrements with inc/dec This patch replaces roughly all occurrences of an HA_ATOMIC_ADD(&foo, 1) or HA_ATOMIC_SUB(&foo, 1) with the equivalent HA_ATOMIC_INC(&foo) and HA_ATOMIC_DEC(&foo) respectively. These are 507 changes over 45 files.	2021-04-07 18:18:37 +02:00
Willy Tarreau	1db427399c	CLEANUP: atomic: add an explicit _FETCH variant for add/sub/and/or Currently our atomic ops return a value but it's never known whether the fetch is done before or after the operation, which causes some confusion each time the value is desired. Let's create an explicit variant of these operations suffixed with _FETCH to explicitly mention that the fetch occurs after the operation, and make use of it at the few call places.	2021-04-07 18:18:37 +02:00
Willy Tarreau	9063a660cc	MINOR: fd: move .exported into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state.	2021-04-07 18:10:36 +02:00
Willy Tarreau	5362bc9044	MINOR: fd: move .et_possible into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state.	2021-04-07 18:09:43 +02:00
Willy Tarreau	030dae13a0	MINOR: fd: move .cloned into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state.	2021-04-07 18:08:29 +02:00
Willy Tarreau	b41a6e9101	MINOR: fd: move .linger_risk into fdtab[].state No need to keep this flag apart any more, let's merge it into the global state. The CLI's output state was extended to 6 digits and the linger/cloned flags moved inside the parenthesis.	2021-04-07 18:07:49 +02:00
Willy Tarreau	f509065191	MEDIUM: fd: merge fdtab[].ev and state for FD_EV_* and FD_POLL_* into state For a long time we've had fdtab[].ev and fdtab[].state which contain two arbitrary sets of information, one is mostly the configuration plus some shutdown reports and the other one is the latest polling status report which also contains some sticky error and shutdown reports. These ones used to be stored into distinct chars, complicating certain operations and not even allowing to clearly see concurrent accesses (e.g. fd_delete_orphan() would set the state to zero while fd_insert() would only set the event to zero). This patch creates a single uint with the two sets in it, still delimited at the byte level for better readability. The original FD_EV_* values remained at the lowest bit levels as they are also known by their bit value. The next step will consist in merging the remaining bits into it. The whole bits are now cleared both in fd_insert() and _fd_delete_orphan() because after a complete check, it is certain that in both cases these functions are the only ones touching these areas. Indeed, for _fd_delete_orphan(), the thread_mask has already been zeroed before a poller can call fd_update_event() which would touch the state, so it is certain that _fd_delete_orphan() is alone. Regarding fd_insert(), only one thread will get an FD at any moment, and it as this FD has already been released by _fd_delete_orphan() by definition it is certain that previous users have definitely stopped touching it. Strictly speaking there's no need for clearing the state again in fd_insert() but it's cheap and will remove some doubts during some troubleshooting sessions.	2021-04-07 18:04:39 +02:00
Willy Tarreau	8d27c203ed	MEDIUM: fd: prepare FD_POLL_* to move to bits 8-15 In preparation of merging FD_POLL* and FD_EV, this only changes the value of FD_POLL_ to use bits 8-15 (the second byte). The size of the field has been temporarily extended to 32 bits already, as well as the temporary variables that carry the new composite value inside fd_update_events(). The resulting fdtab entry becomes temporarily unaligned. All places making access to .ev or FD_POLL_* were carefully inspected to make sure they were safe regarding this change. Only one temporary update was needed for the "show fd" code. The code was only slightly inflated at this step.	2021-04-07 15:08:40 +02:00
Willy Tarreau	fc0cdfb9b7	CLEANUP: fd: remove FD_POLL_DATA and FD_POLL_STICKY The former was not used and the second was used only as a positive mask of the flags to keep instead of having the flags that are updated. Both were removed in favor of a new FD_POLL_UPDT_MASK that only mentions the updated flags. This will ease merging of state and ev later.	2021-04-07 15:08:40 +02:00
Willy Tarreau	6cf13119e2	CLEANUP: fd: remove unused fd_set_running_excl() This one is no longer used and was the origin of the previously mentioned deadlock.	2021-03-24 17:17:21 +01:00
Willy Tarreau	2c3f9818e8	BUG/MEDIUM: fd: do not wait on FD removal in fd_delete() Christopher discovered an issue mostly affecting 2.2 and to a less extent 2.3 and above, which is that it's possible to deadlock a soft-stop when several threads are using a same listener: thread1 thread2 unbind_listener() fd_set_running() lock(listener) listener_accept() fd_delete() lock(listener) while (running_mask); -----> deadlock unlock(listener) This simple case disappeared from 2.3 due to the removal of some locked operations at the end of listener_accept() on the regular path, but the architectural problem is still here and caused by a lock inversion built around the loop on running_mask in fd_clr_running_excl(), because there are situations where the caller of fd_delete() may hold a lock that is preventing other threads from dropping their bit in running_mask. The real need here is to make sure the last user deletes the FD. We have all we need to know the last one, it's the one calling fd_clr_running() last, or entering fd_delete() last, both of which can be summed up as the last one calling fd_clr_running() if fd_delete() calls fd_clr_running() at the end. And we can prevent new threads from appearing in running_mask by removing their bits in thread_mask. So what this patch does is that it sets the running_mask for the thread in fd_delete(), clears the thread_mask, thus marking the FD as orphaned, then clears the running mask again, and completes the deletion if it was the last one. If it was not, another thread will pass through fd_clr_running and will complete the deletion of the FD. The bug is easily reproducible in 2.2 under high connection rates during soft close. When the old process stops its listener, occasionally two threads will deadlock and the old process will then be killed by the watchdog. It's strongly believed that similar situations do exist in 2.3 and 2.4 (e.g. if the removal attempt happens during resume_listener() called from listener_accept()) but if so, they should be much harder to trigger. This should be backported to 2.2 as the issue appeared with the FD migration. It requires previous patches "fd: make fd_clr_running() return the remaining running mask" and "MINOR: fd: remove the unneeded running bit from fd_insert()". Notes for backport: in 2.2, the fd_dodelete() function requires an extra argument "do_close" indicating whether we want to remove and close the FD (fd_delete) or just delete it (fd_remove). While this information is not conveyed along the chain, we know that late calls always imply do_close=1 become do_close=0 exclusively results from fd_remove() which is only used by the config parser and the master, both of which are single-threaded, hence are always the last ones in the running_mask. Thus it is safe to assume that a postponed FD deletion always implies do_close=1. Thanks to Olivier for his help in designing this optimal solution.	2021-03-24 17:17:21 +01:00
Willy Tarreau	71bada5ca4	MINOR: fd: remove the unneeded running bit from fd_insert() There's no point taking the running bit in fd_insert() since by definition there will never be more than one thread inserting the FD, and that fd_insert() may only be done after the fd was allocated by the system, indicating the end of use by any other thread. This will need to be backported to 2.2 to fix an issue.	2021-03-24 17:17:21 +01:00
Willy Tarreau	6e8e10b415	MINOR: fd: make fd_clr_running() return the remaining running mask We'll need to know that a thread is the last one to use an fd, so let's make fd_clr_running() return the remaining bits after removal. Note that in practice we're only interested in knowing if it's zero but the compiler doesn't make use of the clags after the AND and emits a CMPXCHG anyway :-/ This will need to be backported to 2.2 to fix an issue.	2021-03-24 17:17:21 +01:00
Willy Tarreau	5926e384e6	BUG/MINOR: fd: properly wait for !running_mask in fd_set_running_excl() In fd_set_running_excl() we don't reset the old mask in the CAS loop, so if we fail on the first round, we'll forcefully take the FD on the next one. In practice it's used bu fd_insert() and fd_delete() only, none of which is supposed to be passed an FD which is still in use since in practice, given that for now only listeners may be enabled on multiple threads at once. This can be backported to 2.2 but shouldn't result in fixing any user visible bug for now.	2021-02-24 19:40:49 +01:00
Willy Tarreau	586f71b43f	REORG: connection: move the socket iocb (conn_fd_handler) to sock.c conn_fd_handler() is 100% specific to socket code. It's about time it moves to sock.c which manipulates socket FDs. With it comes conn_fd_check() which tests for the socket's readiness. The ugly connection status check at the end of the iocb was moved to an inlined function in connection.h so that if we need it for other socket layers it's not too hard to reuse. The code was really only moved and not changed at all.	2020-12-11 16:26:00 +01:00

1 2

57 Commits