BUG/MEDIUM: task: bound the number of tasks picked from the wait queue at once

There is a theorical problem in the wait queue, which is that with many threads, one could spend a lot of time looping on the newly expired tasks, causing a lot of contention on the global wq_lock and on the global rq_lock. This initially sounds bening, but if another thread does just a task_schedule() or task_queue(), it might end up waiting for a long time on this lock, and this wait time will count on its execution budget, degrading the end user's experience and possibly risking to trigger the watchdog if that lasts too long. The simplest (and backportable) solution here consists in bounding the number of expired tasks that may be picked from the global wait queue at once by a thread, given that all other ones will do it as well anyway. We don't need to pick more than global.tune.runqueue_depth tasks at once as we won't process more, so this counter is updated for both the local and the global queues: threads with more local expired tasks will pick less global tasks and conversely, keeping the load balanced between all threads. This will guarantee a much lower latency if/when wakeup storms happen (e.g. hundreds of thousands of synchronized health checks). Note that some crashes have been witnessed with 1/4 of the threads in wake_expired_tasks() and, while the issue might or might not be related, not having reasonable bounds here definitely justifies why we can spend so much time there. This patch should be backported, probably as far as 2.0 (maybe with some adaptations).
2025-11-17 08:51:09 +01:00 · 2020-10-16 09:26:22 +02:00 · 2020-10-16 09:26:22 +02:00 · 3cfaa8d1e0
commit 3cfaa8d1e0
parent ba29687bc1
1 changed files with 4 additions and 1 deletions
--- a/src/task.c
+++ b/src/task.c
@ -219,11 +219,12 @@ void __task_queue(struct task *task, struct eb_root *wq)
 void wake_expired_tasks()
 {
 	struct task_per_thread * const tt = sched; // thread's tasks
 	int max_processed = global.tune.runqueue_depth;
 	struct task *task;
 	struct eb32_node *eb;
 	__decl_thread(int key);
-	while (1) {
+	while (max_processed-- > 0) {
  lookup_next_local:
 		eb = eb32_lookup_ge(&tt->timers, now_ms - TIMER_LOOK_BACK);
 		if (!eb) {
@ -298,6 +299,8 @@ void wake_expired_tasks()
 	while (1) {
 		HA_RWLOCK_WRLOCK(TASK_WQ_LOCK, &wq_lock);
  lookup_next:
 		if (max_processed-- <= 0)
 			break;
 		eb = eb32_lookup_ge(&timers, now_ms - TIMER_LOOK_BACK);
 		if (!eb) {
 			/* we might have reached the end of the tree, typically because