From 1553b6657dd15565baaa0220ab232ba6c9cf7797 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Tue, 30 Jun 2020 13:46:21 +0200 Subject: [PATCH] BUG/MINOR: sched: properly cover for a rare MT_LIST_ADDQ() race In commit 3ef7a190b ("MEDIUM: tasks: apply a fair CPU distribution between tasklet classes") we compute a total weight to be used to split the CPU time between queues. There is a mention that the total cannot be null, wihch is based on the fact that we only get there if thread_has_task() returns non-zero. But there is a very small race which can break this assumption: if two threads conflict on MT_LIST_ADDQ() on an empty shared list and both roll back before trying again, there is the possibility that a first call to MT_LIST_ISEMPTY() sees the first thread install itself, then the second call will see the list empty when both roll back. Thus we could proceed with the queue while it's temporarily empty and compute max lengths using a divide by zero. This case is very hard to trigger, it seldom happens on 16 threads at 400k req/s. Let's simply test for max_total and leave the loop when we've not found any work. No backport is needed, that's 2.2-only. --- src/task.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/src/task.c b/src/task.c index 6079956ba..9f2e9d5df 100644 --- a/src/task.c +++ b/src/task.c @@ -514,11 +514,17 @@ void process_runnable_tasks() max[TL_BULK] = default_weights[TL_BULK]; /* Now compute a fair share of the weights. Total may slightly exceed - * 100% due to rounding, this is not a problem. Note that by design - * the sum cannot be NULL as we cannot get there without tasklets to - * process. + * 100% due to rounding, this is not a problem. Note that while in + * theory the sum cannot be NULL as we cannot get there without tasklets + * to process, in practice it seldom happens when multiple writers + * conflict and rollback on MT_LIST_ADDQ(shared_tasklet_list), causing + * a first MT_LIST_ISEMPTY() to succeed for thread_has_task() and the + * one above to finally fail. This is extremely rare and not a problem. */ max_total = max[TL_URGENT] + max[TL_NORMAL] + max[TL_BULK]; + if (!max_total) + return; + for (queue = 0; queue < TL_CLASSES; queue++) max[queue] = ((unsigned)max_processed * max[queue] + max_total - 1) / max_total;