mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-09-21 22:01:31 +02:00
Aurélien found an issue introduced in 2.7-dev8 with commit d114f4a68 ("MEDIUM: checks: spread the checks load over random threads"), but which in fact has deeper roots. When a server's state is changed via __health_adjust(), if a fastinter setting is set, the task gets rescheduled to run at the new date. The way it's done is not thread safe, as nothing prevents another thread where the task is already running from also updating the expire field in parallel. But since such events are quite rare, this statistically never happens. However, with the commit above, the tasks are no longer required to go to the shared wait queue and are no longer marked as shared between multiple threads. It's just that *any* thread may run them at a time without implying that all of them are allowed to modify them. And this change is sufficient to trigger the BUG_ON() condition in the scheduler that detects the inconsistency between a task queued in one thread and being manipulated in parallel by another one: FATAL: bug condition "task->tid != tid" matched at include/haproxy/task.h:670 call trace(13): | 0x55f61cf520c9 [c6 04 25 01 00 00 00 00]: main-0x2ee7 | 0x55f61d0646e8 [8b 45 08 a8 40 0f 85 65]: back_handle_st_cer+0x78/0x4d7 | 0x55f61cff3e72 [41 0f b6 4f 01 e9 c8 df]: process_stream+0x2252/0x364f | 0x55f61d0d2fab [48 89 c3 48 85 db 74 75]: run_tasks_from_lists+0x34b/0x8c4 | 0x55f61d0d38ad [29 44 24 18 8b 54 24 18]: process_runnable_tasks+0x37d/0x6c6 | 0x55f61d0a22fa [83 3d 0b 63 1e 00 01 0f]: run_poll_loop+0x13a/0x536 | 0x55f61d0a28c9 [48 8b 1d f0 46 19 00 48]: main+0x14d919 | 0x55f61cf56dfe [31 c0 e8 eb 93 1b 00 31]: main+0x1e4e/0x2d5d At first glance it looked like it could be addressed in the scheduler only, but in fact the problem clearly is at the application level, since some shared fields are manipulated without protection. At minima, the task's expiry ought to be touched only under the server's lock. While it's arguable that the scheduler could make such updates easier, changing it alone will not be sufficient here. Looking at the sequencing closer, it becomes obvious that we do not need this task_schedule() at all: a simple task_wakeup() is sufficient for the callee to update its timers. Indeed, the process_chk_con() function already deals with spurious wakeups, and already uses srv_getinter() to calculate the next wakeup date based on the current state. So here, instead of having to queue the task from __health_adjust() to anticipate a new check, we can simply wake the task up and let it decide when it needs to run next. This is much cleaner as the expiry calculation remains performed at a single place, from the task itself, as it should be, and it fixes the problem above. This should be backported to 2.7, but not to older versions where the risks of breakage are higher than the chance to fix something that ever happened.
The HAProxy documentation has been split into a number of different files for ease of use. Please refer to the following files depending on what you're looking for : - INSTALL for instructions on how to build and install HAProxy - BRANCHES to understand the project's life cycle and what version to use - LICENSE for the project's license - CONTRIBUTING for the process to follow to submit contributions The more detailed documentation is located into the doc/ directory : - doc/intro.txt for a quick introduction on HAProxy - doc/configuration.txt for the configuration's reference manual - doc/lua.txt for the Lua's reference manual - doc/SPOE.txt for how to use the SPOE engine - doc/network-namespaces.txt for how to use network namespaces under Linux - doc/management.txt for the management guide - doc/regression-testing.txt for how to use the regression testing suite - doc/peers.txt for the peers protocol reference - doc/coding-style.txt for how to adopt HAProxy's coding style - doc/internals for developer-specific documentation (not all up to date)
Description
Languages
C
98.1%
Shell
0.8%
Makefile
0.5%
Lua
0.2%
Python
0.2%