From 2e270cf0b0824fb2b83f2ee737a75272687ba9c4 Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Thu, 16 Feb 2023 09:07:00 +0100 Subject: [PATCH] BUG/MINOR: sched: properly report long_rq when tasks remain in the queue There's a per-thread "long_rq" counter that is used to indicate how often we leave the scheduler with tasks still present in the run queue. The purpose is to know when tune.runqueue-depth served to limit latency, due to a large number of tasks being runnable at once. However there's a bug there, it's not always set: if after the first run, one heavy task was processed and later only heavy tasks remain, we'll loop back to not_done_yet where we try to pick more tasks, but none are eligible (since heavy ones have already run) so we directly return without incrementing the counter. This is what causes ultra-low values on long_rq during massive SSL handshakes, that are confusing because they make one believe that tl_class_mask doesn't have the HEAVY flag anymore. Let's just fix that by not returning from the middle of the function. This can be backported as far as 2.4. --- src/task.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/task.c b/src/task.c index faf21f3f1..d4625535a 100644 --- a/src/task.c +++ b/src/task.c @@ -765,7 +765,7 @@ void process_runnable_tasks() */ max_total = max[TL_URGENT] + max[TL_NORMAL] + max[TL_BULK] + max[TL_HEAVY]; if (!max_total) - return; + goto leave; for (queue = 0; queue < TL_CLASSES; queue++) max[queue] = ((unsigned)max_processed * max[queue] + max_total - 1) / max_total; @@ -864,6 +864,7 @@ void process_runnable_tasks() if (max_processed > 0 && thread_has_tasks()) goto not_done_yet; + leave: if (tt->tl_class_mask) activity[tid].long_rq++; }