BUG/MEDIUM: server/addr: fix tune.events.max-events-at-once event miss and leak

An issue has been introduced with cd99440 ("BUG/MAJOR: server/addr: fix a race during server addr:svc_port updates"). Indeed, in the above commit we implemented the atomic_sync task which is responsible for consuming pending server events to apply the changes atomically. For now only server's addr updates are concerned. To prevent the task from causing contention, a budget was assigned to it. It can be controlled with the global tunable 'tune.events.max-events-at-once': the task may not process more than this number of events at once. However, a bug was introduced with this budget logic: each time the task has to be interrupted because it runs out of budget, we reschedule the task to finish where it left off, but the current event which was already removed from the queue wasn't processed yet. This means that this pending event (each tune.events.max-events-at-once) is effectively lost. When the atomic_sync task deals with large number of concurrent events, this bug has 2 known consequences: first a server's addr/port update will be lost every 'tune.events.max-events-at-once'. This can of course cause reliability issues because if the event is not republished periodically, the server could stay in a stale state for indefinite amount of time. This is the case when the DNS server flaps for instance: some servers may not come back UP after the incident as described in GH #2666. Another issue is that the lost event was not cleaned up, resulting in a small memory leak. So in the end, it means that the bug is likely to cause more and more degradation over time until haproxy is restarted. As a workaround, 'tune.events.max-events-at-once' may be set to the maximum number of events expected per batch. Note however that this value cannot exceed 10 000, otherwise it could cause the watchdog to trigger due to the task being busy for too long and preventing other threads from making any progress. Setting higher values may not be optimal for common workloads so it should only be used to mitigate the bug while waiting for this fix. Since tune.events.max-events-at-once defaults to 100, this bug only affects configs that involve more than 100 servers whose addr:port properties are likely to be updated at the same time (batched updates from cli, lua, dns..) To fix the bug, we move the budget check after the current event is fully handled. For that we went from a basic 'while' to 'do..while' loop as we assume from the config that 'tune.events.max-events-at-once' cannot be 0. While at it, we reschedule the task once thread isolation ends (it was not required to perform the reschedule while under isolation) to give the hand back faster to waiting threads. This patch should be backported up to 2.9 with cd99440. It should fix GH #2666.
2025-08-07 23:56:57 +02:00 · 2024-08-06 14:29:56 +02:00 · 2024-08-06 14:29:56 +02:00 · 8f1fd96d17
commit 8f1fd96d17
parent aaaacaaf4b
1 changed files with 18 additions and 16 deletions
--- a/src/server.c
+++ b/src/server.c
@ -228,7 +228,11 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 	struct event_hdl_async_event *event;
 	/* check for new server events that we care about */
-	while ((event = event_hdl_async_equeue_pop(&server_atomic_sync_queue))) {
+	do {
 		event = event_hdl_async_equeue_pop(&server_atomic_sync_queue);
 		if (!event)
 			break; /* no events in queue */
 		if (event_hdl_sub_type_equal(event->type, EVENT_HDL_SUB_END)) {
 			/* ending event: no more events to come */
 			event_hdl_async_free_event(event);
@ -237,20 +241,6 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 			break;
 		}
 		if (!remain) {
 			/* STOP: we've already spent all our budget here, and
 			 * considering we possibly are under isolation, we cannot
 		         * keep blocking other threads any longer.
 			 *
 			 * Reschedule the task to finish where we left off if
 			 * there are remaining events in the queue.
 			 */
 			if (!event_hdl_async_equeue_isempty(&server_atomic_sync_queue))
 				task_wakeup(task, TASK_WOKEN_OTHER);
 			break;
 		}
 		remain--;
 		/* new event to process */
 		if (event_hdl_sub_type_equal(event->type, EVENT_HDL_SUB_SERVER_INETADDR)) {
 			struct sockaddr_storage new_addr;
@ -327,7 +317,7 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 			srv_set_addr_desc(srv, 1);
 		}
 		event_hdl_async_free_event(event);
-	}
+	} while (--remain); // event_hdl_tune.max_events_at_once is expected to be > 0
 	/* some events possibly required thread_isolation:
 	 * now that we are done, we must leave thread isolation before
@ -336,6 +326,18 @@ static struct task *server_atomic_sync(struct task *task, void *context, unsigne
 	if (thread_isolated())
 		thread_release();
 	if (!remain) {
 		/* we stopped because we've already spent all our budget here,
 		 * and considering we possibly were under isolation, we cannot
 	         * keep blocking other threads any longer.
 		 *
 		 * Reschedule the task to finish where we left off if
 		 * there are remaining events in the queue.
 		 */
 		if (!event_hdl_async_equeue_isempty(&server_atomic_sync_queue))
 			task_wakeup(task, TASK_WOKEN_OTHER);
 	}
 	return task;
 }