haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-07 07:37:02 +02:00

Author	SHA1	Message	Date
Olivier Houchard	0c7a4b6371	MINOR: tasks: Don't set the TASK_RUNNING flag when adding in the tasklet list. Now that TASK_QUEUED is enforced, there's no need to set TASK_RUNNING when removing the task from the runqueue to add it to the tasklet list. The flag will only be set right before we run the task.	2019-04-17 19:28:01 +02:00
Olivier Houchard	de82aeaa26	BUG/MEDIUM: tasks: Make sure we modify global_tasks_mask with the rq_lock. When modifying global_tasks_mask, make sure we hold the rq_lock, or we might remove the bit while it has been re-set by somebody else, and we make not be waked when needed.	2019-04-17 19:28:01 +02:00
Willy Tarreau	b038007ae8	BUG/MEDIUM: tasks: Make sure we set TASK_QUEUED before adding a task to the rq. Make sure we set TASK_QUEUED in every case before adding the task to the run queue. task_wakeup() now checks if either TASK_QUEUED or TASK_RUNNING is set, and if neither is set, add TASK_QUEUED and effectively add the task to the runqueue. No longer use __task_wakeup() anywhere except in task_wakeup(), always use task_wakeup() instead. With the old code, process_runnable_task() may re-add a task in the runqueue without setting the TASK_QUEUED flag, and there were race conditions that could lead to a task having the TASK_QUEUED flag but not in the runqueue, thus being unschedulable. This should be backported to 1.9.	2019-04-17 19:28:01 +02:00
Willy Tarreau	3466e3cdcb	BUILD: task/thread: fix single-threaded build of task.c As expected, commit `cde7902ac` ("MEDIUM: tasks: improve fairness between the local and global queues") broke the build with threads disabled, and I forgot to rerun this test before committing. No backport is needed.	2019-04-15 18:52:40 +02:00
Willy Tarreau	c8da044b41	MINOR: tasks: restore the lower latency scheduling when niced tasks are present In the past we used to reduce the number of tasks consulted at once when some niced tasks were present in the run queue. This was dropped in 1.8 when the scheduler started to take batches. With the recent fixes it now becomes possible to restore this behaviour which guarantees a better latency between tasks when niced tasks are present. Thanks to this, with the default number of 200 for tune.runqueue-depth, with a parasitic load of 14000 requests per second, nice 0 gives 14000 rps, nice 1024 gives 12000 rps and nice -1024 gives 16000 rps. The amplitude widens if the runqueue depth is lowered.	2019-04-15 09:50:56 +02:00
Willy Tarreau	2d1fd0a0d2	MEDIUM: tasks: only base the nice offset on the run queue depth The offset calculated for the nice value used to be wrong for a long time and got even worse when the improved multi-thread sheduler was implemented because it continued to rely on the run queue size, which become irrelevant given that we extract tasks in batches, so the run queue size moves following a sawtooth form. However the offsets much better reflects insertion positions in the queue, so it's worth dropping this rq_size component of the equation. Last point, due to the batches made of runqueue-depth entries at once, the higher the depth, the lower the effect of the nice setting since values are picked together in batches and placed into a list. An intuitive approach consists in multiplying the nice value with the batch size to allow tasks to participate to a different batch. And experimentation shows that this works pretty well. With a runqueue-depth of 16 and a parasitic load of 16000 requests per second on 100 streams, a default nice of 0 shows 16000 requests per second for nice 0, 22000 for nice -1024 and 10000 for nice 1024. The difference is even bigger with a runqueue depth of 5. At 200 however it's much smoother (16000-22000).	2019-04-15 09:50:56 +02:00
Willy Tarreau	cde7902ac9	MEDIUM: tasks: improve fairness between the local and global queues Tasks allowed to run on multiple threads, as well as those scheduled by one thread to run on another one pass through the global queue. The local queues only see tasks scheduled by one thread to run on itself. The tasks extracted from the global queue are transferred to the local queue when they're picked by one thread. This causes a priority issue because the global tasks experience a priority contest twice while the local ones experience it only once. Thus if a tasks returns still running, it's immediately reinserted into the local run queue and runs much faster than the ones coming from the global queue. Till 1.9 the tasks going through the global queue were mostly : - health checks initialization - queue management - listener dequeue/requeue These ones are moderately sensitive to unfairness so it was not that big an issue. Since 2.0-dev2 with the multi-queue accept, tasks are scheduled to remote threads on most accept() and it becomes fairly visible under load that the accept slows down, even for the CLI. This patch remedies this by consulting both the local and the global run queues in parallel and by always picking the task whose deadline is the earliest. This guarantees to maintain an excellent fairness between the two queues and removes the cascade effect experienced by the global tasks. Now the CLI always continues to respond quickly even in presence of expensive tasks running for a long time. This patch may possibly be backported to 1.9 if some scheduling issues are reported but at this time it doesn't seem necessary.	2019-04-15 09:50:56 +02:00
Willy Tarreau	24f382f555	CLEANUP: task: do not export rq_next anymore This one hasn't been used anymore since the scheduler changes after 1.8 but it kept being exported and maintained up to date while it's always reset when scanning the trees. Let's stop exporting it and updating it.	2019-04-15 09:50:56 +02:00
Willy Tarreau	587a8130b1	BUG/MINOR: tasks: make sure the first task to be queued keeps its nice value The run queue offset computed from the nice value depends on the run queue size, but for the first task to enter the run queue, this size is zero and the task gets queued just as if its nice value was zero as well. This is problematic for example for the CLI socket if another higher priority task gets queued immediately after as it can steal its place. This patch simply adds one to the rq_size value to make sure the nice is never multiplied by zero. The way the offset is calculated is questionable anyway these days, since with the newer scheduler it seems that just using the nice value as an offset should work (possibly damped by the task's number of calls). This fix must be backported to 1.9. It may possibly be backported to older versions if it proves to make the CLI more interactive.	2019-04-12 15:54:02 +02:00
Willy Tarreau	f8bce3125e	BUG/MEDIUM: task/threads: address a fairness issue between local and global tasks It is possible to hit a fairness issue in the scheduler when a local task runs for a long time (i.e. process_stream() returns running), and a global task wants to run on the same thread and remains in the global queue. What happens in this case is that the condition to extract tasks from the global queue will rarely be satisfied for very low task counts since whatever non-null queue size multiplied by a thread count >1 is always greater than the small remaining number of tasks in the queue. In theory another thread should pick the task but we do have some mono threaded tasks in the global queue as well during inter-thread wakeups. Note that this can only happen with task counts lower than the thread counts, typically one task in each queue for more than two threads. This patch works around the problem by allowing a very small unfairness, making sure that we can always pick at least one task from the global queue even if there is already one in the local queue. A better approach will consist in scanning the two trees in parallel and always pick the best task. This will be more complex and will constitute a separate patch. This fix must be backported to 1.9.	2019-04-12 15:53:43 +02:00
Willy Tarreau	e73256fd2a	BUG/MEDIUM: task/h2: add an idempotent task removal fucntion Previous commit `3ea351368` ("BUG/MEDIUM: h2: Remove the tasklet from the task list if unsubscribing.") uncovered an issue which needs to be addressed in the scheduler's API. The function task_remove_from_task_list() was initially designed to remove a task from the running tasklet list from within the scheduler, and had to be used in h2 to abort pending I/O events. However this function was not designed to be idempotent, occasionally causing a double removal from the tasklet list, with the second doing nothing but affecting the apparent tasks count and making haproxy use 100% CPU on some tests consisting in stopping the client during some transfers. The h2_unsubscribe() function can sometimes be called upon stream exit after an error where the tasklet was possibly already removed, so it. This patch does 2 things : - it renames task_remove_from_task_list() to __task_remove_from_tasklet_list() to discourage users from calling it. Also note the fix in the naming since it's a tasklet list and not a task list. This function is still uesd from the scheduler. - it adds a new, idempotent, task_remove_from_tasklet_list() function which does nothing if the task is already not in the tasklet list. This patch will need to be backported where the commit above is backported.	2019-03-25 18:02:54 +01:00
Olivier Houchard	1b32790324	BUG/MEDIUM: tasks: Make sure we wake sleeping threads if needed. When waking a task on a remote thread, we currently check 1) if this thread was sleeping, and 2) if it was already marked as active before writing to its pipe. Unfortunately this doesn't always work as desired because only one thread from the mask is woken up, while the active_tasks_mask indicates all eligible threads for this task. As a result, if one multi-thread task (e.g. a health check) wakes up to run on any thread, then an accept() dispatches an incoming connection on thread 2, this thread will already have its bit set in active_tasks_mask because of the previous wakeup and will not be woken up. This is easily noticeable on 2.0-dev by injecting on a multi-threaded listener with a single connection at a time while health checks are running quickly in the background : the injection runs slowly with random response times (the poll timeouts). In 1.9 it affects the dequeing of server connections, which occasionally experience pauses if multiple threads share the same queue. The correct solution consists in adjusting the sleeping_thread_mask when waking another thread up. This mask reflects threads that are sleeping, hence that need to be signaled to wake up. Threads with a bit in active_tasks_mask already don't have their sleeping_thread_mask bit set before polling so the principle remains consistent. And by doing so we can remove the old_active_mask field. This should be backported to 1.9.	2019-03-15 14:09:39 +01:00
Olivier Houchard	4c28328572	MEDIUM: task: Use the new _HA_ATOMIC_* macros. Use the new _HA_ATOMIC_* macros and add barriers where needed.	2019-03-11 17:02:37 +01:00
Olivier Houchard	d2b5d16187	MEDIUM: various: Use __ha_barrier_atomic* when relevant. When protecting data modified by atomic operations, use __ha_barrier_atomic* to avoid unneeded barriers on x86.	2019-03-11 17:02:37 +01:00
Willy Tarreau	155acffc13	BUG/MINOR: task: close a tiny race in the inter-thread wakeup __task_wakeup() takes care of a small race that exists between threads, but it uses a store barrier that is not sufficient since apparently the state read after clearing the leaf_p pointer sometimes is incorrect. This results in missed wakeups between threads competing at a high rate. Let's use a full barrier instead to serialize the operations. This may be backported to 1.9 though it's extremely unlikely that this bug will ever manifest itself there.	2019-02-04 14:21:35 +01:00
Willy Tarreau	1ee55fddea	MEDIUM: tasks: check the global task mask instead of the thread number When deciding whether to scan the global run queue or not, we currently check the configured threads number, and if it's 1 we skip the queue since it's not supposed to be used. However when running with a master process and multiple threads in the workers, the master will turn this number back to 1 while some task wakeups might possibly have set bits in the global tasks mask, thus causing active_tasks_mask to have one bit permanently set, preventing the process from sleeping. Instead of checking global.nbthread, let's check for the current thread's bit in global_tasks_mask. First it will make this part of the code more consistent, working like a test and set operation, it will solve the issue with master+nbthread and as a bonus it will save a lock/unlock for each scheduler call when the thread doesn't have a task in the global run queue.	2018-12-14 15:49:45 +01:00
William Lallemand	b582339079	BUG/MEDIUM: mworker: fix several typos in mworker_cleantasks() Commit `27f3fa5` ("BUG/MEDIUM: mworker: stop every tasks in the master") used MAX_THREADS as a mask instead of MAX_THREADS_MASK to clean the global run queue, and used rq_next (global variable) instead of next_rq. Renamed next_rq as tmp_rq and next_wq as tmp_wq to avoid confusion. No backport needed.	2018-12-06 15:38:24 +01:00
William Lallemand	27f3fa56f5	BUG/MEDIUM: mworker: stop every tasks in the master The master is not supposed to run (at the moment) any task before the polling loop, the created tasks should be run only in the workers but in the master they should be disabled or removed. No backport needed.	2018-12-06 14:12:58 +01:00
Willy Tarreau	b6b3df3ed3	MEDIUM: initcall: use initcalls for a few initialization functions signal_init(), init_log(), init_stream(), and init_task() all used to only preset some values and lists. This needs to be done very early to provide a reliable interface to all other users. The calls used to be explicit in haproxy.c:init(). Now they're placed in initcalls at the STG_PREPARE stage. The functions are not exported anymore.	2018-11-26 19:50:32 +01:00
Willy Tarreau	8ceae72d44	MEDIUM: init: use initcall for all fixed size pool creations This commit replaces the explicit pool creation that are made in constructors with a pool registration. Not only this simplifies the pools declaration (it can be done on a single line after the head is declared), but it also removes references to pools from within constructors. The only remaining create_pool() calls are those performed in init functions after the config is parsed, so there is no more user of potentially uninitialized pool now. It has been the opportunity to remove no less than 12 constructors and 6 init functions.	2018-11-26 19:50:32 +01:00
Willy Tarreau	86abe44e42	MEDIUM: init: use self-initializing spinlocks and rwlocks This patch replaces a number of __decl_hathread() followed by HA_SPIN_INIT or HA_RWLOCK_INIT by the new __decl_spinlock() or __decl_rwlock() which automatically registers the lock for initialization in during the STG_LOCK init stage. A few static modifiers were lost in the process, but since they were not essential at all it was not worth extending the API to provide such a variant.	2018-11-26 19:50:32 +01:00
Willy Tarreau	9efd7456e0	MEDIUM: tasks: collect per-task CPU time and latency Right now we measure for each task the cumulated time spent waiting for the CPU and using it. The timestamp uses a 64-bit integer to report a nanosecond-level date. This is only enabled when "profiling.tasks" is enabled, and consumes less than 1% extra CPU on x86_64 when enabled. The cumulated processing time and wait time are reported in "show sess". The task's counters are also reset when an HTTP transaction is reset since the HTTP part pretends to restart on a fresh new stream. This will make sure we always report correct numbers for each request in the logs.	2018-11-22 15:44:21 +01:00
Joseph Herlant	cf92b6d332	CLEANUP: Fix typos in the task subsystem Fix typos in the code comments of the task subsystem.	2018-11-18 22:26:42 +01:00
Willy Tarreau	8d8747abe0	OPTIM: tasks: group all tree roots per cache line Currently we have per-thread arrays of trees and counts, but these ones unfortunately share cache lines and are accessed very often. This patch moves the task-specific stuff into a structure taking a multiple of a cache line, and has one such per thread. Just doing this has reduced the cache miss ratio from 19.2% to 18.7% and increased the 12-thread test performance by 3%. It starts to become visible that we really need a process-wide per-thread storage area that would cover more than just these parts of the tasks. The code was arranged so that it's easy to move the pieces elsewhere if needed.	2018-10-15 19:06:13 +02:00
Willy Tarreau	b20aa9eef3	MAJOR: tasks: create per-thread wait queues Now we still have a main contention point with the timers in the main wait queue, but the vast majority of the tasks are pinned to a single thread. This patch creates a per-thread wait queue and queues a task to the local wait queue without any locking if the task is bound to a single thread (the current one) otherwise to the shared queue using locking. This significantly reduces contention on the wait queue. A test with 12 threads showed 11 ms spent in the WQ lock compared to 4.7 seconds in the same test without this change. The cache miss ratio decreased from 19.7% to 19.2% on the 12-thread test, and its performance increased by 1.5%. Another indirect benefit is that the average queue size is divided by the number of threads, which roughly removes log(nbthreads) levels in the tree and further speeds up lookups.	2018-10-15 19:04:40 +02:00
Willy Tarreau	0b25d5e99f	MEDIUM: task: perform a single tree lookup per run queue batch The run queue is designed to perform a single tree lookup and to use multiple passes to eb32sc_next(). The scheduler rework took a conservative approach first but this is not needed anymore and it increases the processing cost of process_runnable_tasks() and even the time during which the RQ lock is held if the global queue is heavily loaded. Let's simply move the initial lookup to the entry of the loop like the previous scheduler used to do. This has reduced by a factor of 5.5 the number of calls to eb32sc_lookup_get() there.	2018-10-10 16:42:46 +02:00
Olivier Houchard	19bdf2428d	MINOR: tasks: Don't special-case when nbthreads == 1 Instead of checking if nbthreads == 1, just and thread_mask with all_threads_mask to know if we're supposed to add the task to the local or the global runqueue.	2018-08-17 14:50:37 +02:00
Olivier Houchard	d8b7a4701d	BUG/MEDIUM: tasks: Don't insert in the global rqueue if nbthread == 1 Make sure we don't insert a task in the global run queue if nbthread == 1, as, as an optimisation, we avoid reading from it if nbthread == 1.	2018-08-16 19:25:46 +02:00
Willy Tarreau	85d9b84eb1	BUILD/MINOR: threads: unbreak build with threads disabled Depending on the optimization level, gcc may complain that wake_thread() uses an invalid array index for poller_wr_pipe[] when called from __task_wakeup(). Normally the condition to get there never happens, but it's simpler to ifdef out this part of the code which is only used to wake other threads up. No backport is needed, this was brought by the recent introduction of the ability to wake a sleeping thread.	2018-07-27 17:18:22 +02:00
Olivier Houchard	79321b95a8	MINOR: pollers: Add a way to wake a thread sleeping in the poller. Add a new pipe, one per thread, so that we can write on it to wake a thread sleeping in a poller, and use it to wake threads supposed to take care of a task, if they are all sleeping.	2018-07-26 19:09:50 +02:00
Olivier Houchard	eba0c0b51d	MINOR: tasks: Make global_tasks_mask volatile. In order to make sure modifications are noticed by other threads when needed, make global_tasks_mask volatile.	2018-07-26 19:09:50 +02:00
Olivier Houchard	9b03c0c9a7	MINOR: tasks: Make active_tasks_mask volatile. To be sure we have the relevant informations, make active_tasks_mask volatile	2018-07-26 19:09:50 +02:00
Olivier Houchard	77551ee8a7	BUG/MEDIUM: tasks: make __task_unlink_rq responsible for the rqueue size. As __task_wakeup() is responsible for increasing rqueue_local[tid]/global_rqueue_size, make __task_unlink_rq responsible for decreasing it, as process_runnable_tasks() isn't the only one that removes tasks from runqueues.	2018-07-26 16:33:29 +02:00
Olivier Houchard	76e45181b2	MINOR: tasks: Add a flag that tells if we're in the global runqueue. How that we have bits available in task->state, add a flag that tells if we're in the global runqueue or not.	2018-07-26 16:33:10 +02:00
Olivier Houchard	c4aac9effe	BUG/MEDIUM: tasks: Make sure there's no task left before considering inactive. We may remove the thread's bit in active_tasks_mask despite tasks for that thread still being present in the global runqueue. To fix that, introduce global_tasks_mask, and set the correspnding bits when we add a task to the runqueue.	2018-07-26 15:40:22 +02:00
Willy Tarreau	189ea856a7	BUG/MEDIUM: tasks: use atomic ops for active_tasks_mask We don't have the lock anymore so we need to protect it.	2018-07-26 15:16:43 +02:00
Olivier Houchard	e85ee7b663	BUG/MEDIUM: tasks: Decrement rqueue_size at the right time. We need to decrement requeue_size when we remove a task form rqueue_local, not when we remove if from the task list, or we'd also decrement it for any tasklet, that was never in the rqueue in the first place.	2018-07-26 15:00:58 +02:00
Willy Tarreau	9a77186cb0	BUG/MEDIUM: tasks: make sure we pick all tasks in the run queue Commit `09eeb76` ("BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue.") addressed a count issue in the run queue and uncovered another issue with the way the tasks are dequeued from the global run queue. The number of tasks to pick is computed using an integral divide, which results in up to nbthread-1 tasks never being run. The fix simply consists in getting rid of the divide and checking the task count in the loop. No backport is needed, this is 1.9-specific.	2018-07-26 14:24:46 +02:00
Olivier Houchard	9db0fedb59	BUG/MINOR: tasklets: Just make sure we don't pass a tasklet to the handler. We can't just set t to NULL if it's a tasklet, or we'd have a hard time accessing to t->process, so just make sure we pass NULL as the first parameter of t->process if it's a tasklet. This should be a non-issue at this point, as tasklets aren't used yet.	2018-06-14 18:57:26 +02:00
Olivier Houchard	b1ca58b245	MINOR: tasks: Don't define rqueue if we're building without threads. To make sure we don't inadvertently insert task in the global runqueue, while only the local runqueue is used without threads, make its definition and usage conditional on USE_THREAD.	2018-06-06 16:35:12 +02:00
David Carlier	cc0a957a50	MINOR: task: Fix compiler warning. Waking up task, when checking if it is a valid entry. Similarly to commit `caa8a37ffe`, casting explicitally to void pointer as HA_ATOMIC_CAS needs.	2018-06-05 13:55:57 +02:00
Olivier Houchard	082627af77	MINOR: task: Also consider the task list size when getting global tasks. We're taking tasks from the global runqueue based on the number of tasks the thread already have in its local runqueue, but now that we have a task list, we also have to take that into account.	2018-05-28 15:20:59 +02:00
Olivier Houchard	736ea41c6c	BUG/MEDIUM: task: Don't forget to decrement max_processed after each task. When the task list was introduced, we bogusly lost max_processed--, that means we would execute as much tasks as present in the list, and we would never set active_tasks_mask, so the thread would go to sleep even if more tasks were to be executed. 1.9-dev only, no backport is needed.	2018-05-28 15:20:57 +02:00
Olivier Houchard	1599b80360	MINOR: tasks: Make the number of tasks to run at once configurable. Instead of hardcoding 200, make the number of tasks to be run configurable using tune.runqueue-depth. 200 is still the default.	2018-05-26 20:03:24 +02:00
Olivier Houchard	b0bdae7b88	MAJOR: tasks: Introduce tasklets. Introduce tasklets, lightweight tasks. They have no notion of priority, they are just run as soon as possible, and will probably be used for I/O later. For the moment they're used to replace the temporary thread-local list that was used in the scheduler. The first part of the struct is common with tasks so that tasks can be cast to tasklets and queued in this list. Once a task is in the tasklet list, it has its leaf_p set to 0x1 so that it cannot accidently be confused as not in the queue. Pure tasklets are identifiable by their nice value of -32768 (which is normally not possible).	2018-05-26 20:03:19 +02:00
Olivier Houchard	f6e6dc12cd	MAJOR: tasks: Create a per-thread runqueue. A lot of tasks are run on one thread only, so instead of having them all in the global runqueue, create a per-thread runqueue which doesn't require any locking, and add all tasks belonging to only one thread to the corresponding runqueue. The global runqueue is still used for non-local tasks, and is visited by each thread when checking its own runqueue. The nice parameter is thus used both in the global runqueue and in the local ones. The rare tasks that are bound to multiple threads will have their nice value used twice (once for the global queue, once for the thread-local one).	2018-05-26 19:27:29 +02:00
Olivier Houchard	9f6af33222	MINOR: tasks: Change the task API so that the callback takes 3 arguments. In preparation for thread-specific runqueues, change the task API so that the callback takes 3 arguments, the task itself, the context, and the state, those were retrieved from the task before. This will allow these elements to change atomically in the scheduler while the application uses the copied value, and even to have NULL tasks later.	2018-05-26 19:23:57 +02:00
Olivier Houchard	9b36cb4a41	BUG/MEDIUM: task: Don't free a task that is about to be run. While running a task, we may try to delete and free a task that is about to be run, because it's part of the local tasks list, or because rq_next points to it. So flag any task that is in the local tasks list to be deleted, instead of run, by setting t->process to NULL, and re-make rq_next a global, thread-local variable, that is modified if we attempt to delete that task. Many thanks to PiBa-NL for reporting this and analysing the problem. This should be backported to 1.8.	2018-05-04 20:11:04 +02:00
Willy Tarreau	d80cb4ee13	MINOR: global: add some global activity counters to help debugging A number of counters have been added at special places helping better understanding certain bug reports. These counters are maintained per thread and are shown using "show activity" on the CLI. The "clear counters" commands also reset these counters. The output is sent as a single write(), which currently produces up to about 7 kB of data for 64 threads. If more counters are added, it may be necessary to write into multiple buffers, or to reset the counters. To backport to 1.8 to help collect more detailed bug reports.	2018-01-23 15:38:33 +01:00
Willy Tarreau	a24d1d0be4	MINOR: task: align the rq and wq locks We really don't want them to share the same cache line as they are expected to be used in parallel. Adding a 64-byte alignment here shows a performance increase of about 4.5% on task-intensive workloads with 2 to 4 threads.	2017-11-26 11:10:51 +01:00

1 2 3

121 Commits