BUG/MEDIUM: tasks: Do not loop in task_schedule() if a task is running

Commit 7e1cc0fcdbcace75a957a69fc8a4d991f7b30fdb made it so we'd loop in
task_schedule() if the task is currently running, so that we'd be sure
that the task would not overwrite the expire field. However,
task_schedule() may be called while a lock is held, and if the running
task attempts to acquire that lock, it will lead to a deadlock.
So instead, if the task is running, just wake it up, so that we're sure
that it will reschedule itself correctly, as it should do anyway. We
already do nothing if the task is in a run queue, so it is expected that
tasks do that, and if they do not, then it is a bug.

This should fix github issue #3350
This should be backported where 7e1cc0fcdbcace75a957a69fc8a4d991f7b30fdb
has been backported, which should be up to 2.8.
This commit is contained in:
Olivier Houchard 2026-04-29 12:24:08 +02:00 committed by Olivier Houchard
parent f9c2d47677
commit 261aa23522

View File

@ -713,17 +713,21 @@ static inline void _task_schedule(struct task *task, int when, const struct ha_c
#ifdef USE_THREAD
if (task->tid < 0) {
int was_running;
/*
* Make sure the task is not already running before changing
* its expire, otherwise it could overwrite our modification
* If the task is already running, then just wake it up, just
* in case it did not notice it should reschedule itself.
* There is nothing else we can do, if it runs it may
* overwrite the expire field, so we can't just set it, and
* we can afford to wait until it is no longer running, just
* in case we are currently holding a lock, and the running
* task tries to acquire that lock.
* Tasks are supposed to take care of their own scheduling if
* needed anyway, we already do nothing if the task is in the
* runqueue, so it should be okay.
*/
if (task == th_ctx->current)
was_running = 1;
else {
was_running = 0;
while (HA_ATOMIC_FETCH_OR(&task->state, TASK_RUNNING) & TASK_RUNNING)
__ha_cpu_relax();
if (HA_ATOMIC_FETCH_OR(&task->state, TASK_RUNNING) & TASK_RUNNING) {
task_wakeup(task, TASK_WOKEN_OTHER);
return;
}
/* FIXME: is it really needed to lock the WQ during the check ? */
@ -732,8 +736,7 @@ static inline void _task_schedule(struct task *task, int when, const struct ha_c
when = tick_first(when, task->expire);
task->expire = when;
if (!was_running)
task_drop_running(task, 0);
task_drop_running(task, 0);
if (!task_in_wq(task) || tick_is_lt(task->expire, task->wq.key)) {
if (likely(caller)) {
caller = HA_ATOMIC_XCHG(&task->caller, caller);