haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2026-03-23 16:01:14 +01:00

Author	SHA1	Message	Date
Maxime Henrion	bc8e14ec23	CLEANUP: use the automatic alignment feature - Use the automatic alignment feature instead of hardcoding 64 all over the code. - This also converts a few bare __attribute__((aligned(X))) to using the ALIGNED macro.	2025-12-09 17:14:58 +01:00
Willy Tarreau	25f5f357cc	MINOR: sched: pass the thread number to is_sched_alive() Now it will be possible to query any thread's scheduler state, not only the current one. This aims at simplifying the watchdog checks for reported threads. The operation is now a simple atomic xchg.	2025-10-01 10:18:53 +02:00
Olivier Houchard	4abfade371	MINOR: tasks: Remove unused tasklet_remove_from_tasklet_list Remove tasklet_remove_from_tasklet_list, as the function hasn't been used for a long time, and there is little reason to keep it.	2025-04-30 17:09:06 +02:00
Olivier Houchard	2bab043c8c	MEDIUM: tasks: Remove TASK_IN_LIST and use TASK_QUEUED instead. TASK_QUEUED was used to mean "the task has been scheduled to run", TASK_IN_LIST was used to mean "the tasklet has been scheduled to run", remove TASK_IN_LIST and just use TASK_QUEUED for tasklets instead. This commit is just cosmetic, and should not have any impact.	2025-04-30 17:08:57 +02:00
Willy Tarreau	36ec70c526	MINOR: sched: add a new function is_sched_alive() to report scheduler's health This verifies that the scheduler is still ticking without having to access the activity[] array nor keeping local copies of the ctxsw counter. It just tests and sets a flag that is reset after each return from a ->process() function.	2025-04-17 16:25:47 +02:00
Aurelien DARRAGON	11d4d0957e	MEDIUM: task: make notification_* API thread safe by default Some notification_* functions were not thread safe by default as they assumed only one producer would emit events for registered tasks. While this suited well with the Lua sockets use-case, this proved to be a limitation with some other event sources (ie: lua Queue class) instead of having to deal with both the non thread safe and thread safe variants (_mt suffix), which is error prone, let's make the entire API thread safe regarding the event list. Pruning functions still require that only one thread executes them, with Lua this is always the case because there is one cleanup list per context.	2025-04-03 17:52:50 +02:00
Aurelien DARRAGON	b77b1a2c3a	MINOR: task: add thread safe notification_new and notification_wake variants notification_new and notification_wake were historically meant to be called by a single thread doing both the init and the wakeup for other tasks waiting on the signals. In this patch, we extend the API so that notification_new and notification_wake have thread-safe variants that can safely be used with multiple threads registering on the same list of events and multiple threads pushing updates on the list.	2025-04-03 17:52:03 +02:00
Willy Tarreau	e6f4f15929	MINOR: tasklet: set TASK_WOKEN_OTHER on tasklets by default Now when tasklets are woken up via tasklet_wakeup(), tasklet_wakeup_on() or tasklet_wakeup_after(), either the optional wakeup flags will be used, or TASK_WOKEN_OTHER will be used. This allows tasklet handlers waking up for any given cause to notice whether or not they were also woken for another reason. For example, a mux handler could skip heavy parts when seeing that TASK_WOKEN_OTHER is absent, proving that no standard tasklet_wakeup() was done, for example in response to a subscribe(). The benefit of the TASK_WOKEN_* flags is that they're purged during the wakeup, and that they're easy to check for using TASK_WOKEN_ANY. TASK_F_UEVT1 and TASK_F_UEVT2 are also usable for private use (e.g. wakeup from a stream to a connection inside a mux). Probably that in the future, code dealing with subscribe events should start to place TASK_WOKEN_IO like is done for upper layers.	2024-12-03 19:45:08 +01:00
Willy Tarreau	12fcd65468	MINOR: tasklet: support an optional set of wakeup flags to tasklet_wakeup_on() tasklet_wakeup_on() and its derivates (tasklet_wakeup_after() and tasklet_wakeup()) do not support passing a wakeup cause like task_wakeup(). This is essentially due to an API limitation cause by the fact that for a very long time the only reason for waking up was to process pending I/O. But with the growing complexity of mux tasks, it is becoming important to be able to skip certain heavy processing when not strictly needed. One possibility is to permit the caller of tasklet_wakeup() to pass flags like task_wakeup(). Instead of going with a complex naming scheme, let's simply make the flags optional and be zero when not specified. This means that tasklet_wakeup_on() now takes either 2 or 3 args, and that the third one is the optional flags to be passed to the callee. Eligible flags are essentially the non-persistent ones (TASK_F_UEVT* and TASK_WOKEN_*) which are cleared when the tasklet is executed. This way the handler will find them in its <state> argument and will be able to distinguish various causes for the call.	2024-11-19 20:13:41 +01:00
Willy Tarreau	0334cb28a9	MINOR: tasklet: make the low-level tasklet API take a flag Everything in the tasklet layer supports flags, except that they are just not implemented in the wakeup functions, while they are in the task_wakeup functions. Initially it was not considered useful to pass wakeup causes because these were essentially I/O, but with the growing number of I/O handlers having to deal with various types of operations (typically cheap I/O notifications on subscribe vs heavy parsing on application-level wakeups), it would be nice to start to make this distinction possible. This commit extends _tasklet_wakeup_on() and _tasklet_wakeup_after() to pass a set of flags that continues to be set as zero. For now this changes nothing, but new functions will come.	2024-11-19 20:13:41 +01:00
Willy Tarreau	c5052bad8a	MINOR: sched: add TASK_F_WANTS_TIME to make the scheduler update the call date Currently tasks being profiled have th_ctx->sched_call_date set to the current nanosecond in monotonic time. But there's no other way to have this, despite the scheduler being capable of it. Let's just declare a new task flag, TASK_F_WANTS_TIME, that makes the scheduler take the time just before calling the handler. This way, a task that needs nanosecond resolution on the call date will be able to be called with an up-to-date date without having to abuse now_mono_time() if not needed. In addition, if CLOCK_MONOTONIC is not supported (now_mono_time() always returns 0), the date is set to the most recently known now_ns, which is guaranteed to be atomic and is only updated once per poll loop. This date can be more conveniently retrieved using task_mono_time(). This can be useful, e.g. for pacing. The code was slightly adjusted so as to merge the common parts between the profiling case and this one.	2024-11-19 20:13:41 +01:00
Willy Tarreau	fc800b6cb7	MINOR: task/profiling: do not record task_drop_running() as a caller Task_drop_running() is used to remove the RUNNING bit and check if while the task was running it got a new wakeup from itself. Thus each time task_drop_running() marks itself as a caller, it in fact removes the previous caller that woke up the task, such as below: Tasks activity over 10.439 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 57895273 6.396m 6.628us 2.733h 170.0us <- run_tasks_from_lists@src/task.c:658 task_drop_running Better not mark this function as a caller and keep the original one: Tasks activity over 13.834 sec till 0.000 sec ago: function calls cpu_tot cpu_avg lat_tot lat_avg task_run_applet 62424582 5.825m 5.599us 5.717h 329.7us <- sc_app_chk_rcv_applet@src/stconn.c:952 appctx_wakeup	2023-11-27 11:24:52 +01:00
Willy Tarreau	a13f8425f0	MINOR: task/debug: make task_queue() and task_schedule() possible callers It's common to see process_stream() being woken up by wake_expired_tasks in the profiling output, without knowing which timeout was set to cause this. By making it possible to record the call places of task_queue() and task_schedule(), and by making wake_expired_tasks() explicitly not replace it, we'll be able to know which task_queue() or task_schedule() was triggered for a given wakeup. For example below: process_stream 51200 311.4ms 6.081us 34.59s 675.6us <- run_tasks_from_lists@src/task.c:659 task_queue process_stream 19227 70.00ms 3.640us 9.813m 30.62ms <- sc_notify@src/stconn.c:1136 task_wakeup process_stream 6414 102.3ms 15.95us 8.093m 75.70ms <- stream_new@src/stream.c:578 task_wakeup It's visible that it's the run_tasks_from_lists() which in fact applies on the task->expire returned by the ->process() function itself.	2023-11-09 17:24:00 +01:00
Willy Tarreau	0eb0914dba	MINOR: task/debug: explicitly support passing a null caller to wakeup functions This is used for tracing and profiling. By permitting to have a NULL caller, we allow a caller to explicitly pass zero to state that the current caller must not be replaced. This will soon be used by wake_expired_tasks() to avoid replacing a caller in the expire loop.	2023-11-09 17:24:00 +01:00
Willy Tarreau	28ff1a5d56	MINOR: tasks/stats: report the number of niced tasks in "show info" We currently know the number of tasks in the run queue that are niced, and we don't expose it. It's too bad because it can give a hint about what share of the load is relevant. For example if one runs a Lua script that was purposely reniced, or if a stats page or the CLI is hammered with slow operations, seeing them appear there can help identify what part of the load is not caused by the traffic, and improve monitoring systems or autoscalers.	2023-09-06 17:44:44 +02:00
Tim Duesterhus	3a8c63d48d	MINOR: Make `tasklet_free()` safe to be called with `NULL` Make this freeing function safe, like other freeing functions are as discussed in GitHub issue #2126.	2023-04-23 00:28:25 +02:00
Willy Tarreau	fc50b9dd14	BUG/MAJOR: sched: protect task during removal from wait queue The issue addressed by commit fbb934da9 ("BUG/MEDIUM: stick-table: fix a race condition when updating the expiration task") is still present when thread groups are enabled, but this time it lies in the scheduler. What happens is that a task configured to run anywhere might already have been queued into one group's wait queue. When updating a stick table entry, sometimes the task will have to be dequeued and requeued. For this a lock is taken on the current thread group's wait queue lock, but while this is necessary for the queuing, it's not sufficient for dequeuing since another thread might be in the process of expiring this task under its own group's lock which is different. This is easy to test using 3 stick tables with 1ms expiration, 3 track-sc rules and 4 thread groups. The process crashes almost instantly under heavy traffic. One approach could consist in storing the group number the task was queued under in its descriptor (we don't need 32 bits to store the thread id, it's possible to use one short for the tid and another one for the tgrp). Sadly, no safe way to do this was figured, because the race remains at the moment the thread group number is checked, as it might be in the process of being changed by another thread. It seems that a working approach could consist in always having it associated with one group, and only allowing to change it under this group's lock, so that any code trying to change it would have to iterately read it and lock its group until the value matches, confirming it really holds the correct lock. But this seems a bit complicated, particularly with wait_expired_tasks() which already uses upgradable locks to switch from read state to a write state. Given that the shared tasks are not that common (stick-table expirations, rate-limited listeners, maybe resolvers), it doesn't seem worth the extra complexity for now. This patch takes a simpler and safer approach consisting in switching back to a single wq_lock, but still keeping separate wait queues. Given that shared wait queues are almost always empty and that otherwise they're scanned under a read lock, the contention remains manageable and most of the time the lock doesn't even need to be taken since such tasks are not present in a group's queue. In essence, this patch reverts half of the aforementionned patch. This was tested and confirmed to work fine, without observing any performance degradation under any workload. The performance with 8 groups on an EPYC 74F3 and 3 tables remains twice the one of a single group, with the contention remaining on the table's lock first. No backport is needed.	2022-11-22 09:10:08 +01:00
Willy Tarreau	2830d282e5	DEBUG: task: simplify the caller recording in DEBUG_TASK Instead of storing an index that's swapped at every call, let's use the two pointers as a shifting history. Now we have a permanent "caller" field that records the last caller, and an optional prev_caller in the debug section enabled by DEBUG_TASK that keeps a copy of the previous caller one. This way, not only it's much easier to follow what's happening during debugging, but it saves 8 bytes in the struct task in debug mode and still keeps it under 2 cache lines in nominal mode, and this will finally be usable everywhere and later in profiling. The caller_idx was also used as a hint that the entry was freed, in order to detect wakeup-after-free. This was changed by setting caller to -1 instead and preserving its value in caller[1]. Finally, the operations were made atomic. That's not critical but since it's used for debugging and race conditions represent a significant part of the issues in multi-threaded mode, it seems wise to at least eliminate some possible factors of faulty analysis.	2022-09-08 14:30:38 +02:00
Willy Tarreau	e08af9a0f4	DEBUG: task: use struct ha_caller instead of arrays of file:line This reduces the task struct by 8 bytes, reduces the code size a little bit by simplifying the calling convention (one argument dropped), and as a bonus provides the function name in the caller.	2022-09-08 14:30:38 +02:00
Willy Tarreau	d2b2ad902b	DEBUG: task: define a series of wakeup types for tasks and tasklets The WAKEUP_* values will be used to report how a task/tasklet was woken up, and task_wakeup_type_str() wlil report the associated function name.	2022-09-08 14:30:16 +02:00
Willy Tarreau	6a28a30efa	MINOR: tasks: do not keep cpu and latency times in struct task It was a mistake to put these two fields in the struct task. This was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task CPU time and latency"). These fields are used solely by streams in order to report the measurements via the lat_ns* and cpu_ns* sample fetch functions when task profiling is enabled. For the rest of the tasks, this is pure CPU waste when profiling is enabled, and memory waste 100% of the time, as the point where these latencies and usages are measured is in the profiling array. Let's move the fields to the stream instead, and have process_stream() retrieve the relevant info from the thread's context. The struct task is now back to 120 bytes, i.e. almost two cache lines, with 32 bit still available.	2022-09-08 14:19:15 +02:00
Willy Tarreau	04e50b3d32	CLEANUP: task: rename ->call_date to ->wake_date This field is misnamed because its real and important content is the date the task was woken up, not the date it was called. It temporarily holds the call date during execution but this remains confusing. In fact before the latency measurements were possible it was indeed a call date. Thus is will now be called wake_date. This change is necessary because a subsequent fix will require the introduction of the real call date in the thread ctx.	2022-09-08 14:19:15 +02:00
Willy Tarreau	768c2c5678	MINOR: task: permanently enable latency measurement on tasklets When tasklet latency measurement was enabled in 2.4 with commit b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"), the feature was conditionned on DEBUG_TASK because the field would add 8 bytes to the struct tasklet. This approach was not a very good idea because the struct ends on an int anyway thus it does finish with a 32-bit hole regardless of the presence of this field. What is true however is that adding it turned a 64-byte struct to 72-byte when caller debugging is enabled. This patch revisits this with a minor change. Now only the lowest 32 bits of the call date are stored, so they always fit in the remaining hole, and this allows to remove the dependency on DEBUG_TASK. With debugging off, we're now seeing a 48-byte struct, and with debugging on it's exactly 64 bytes, thus still exactly one cache line. 32 bits allow a latency of 4 seconds on a tasklet, which already indicates a completely dead process, so there's no point storing the upper bits at all. And even in the event it would happen once in a while, the lost upper bits do not really add any value to the debug reports. Also, now one tasklet wakeup every 4 billion will not be sampled due to the test on the value itself. Similarly we just don't care, it's statistics and the measurements are not 9-digit accurate anyway.	2022-09-08 14:19:15 +02:00
Willy Tarreau	0fae3a0360	BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet There's a subtle (harmless) bug in task_instant_wakeup(). As it uses some tasklet code instead of some task code, the debug part also acts on the tasklet equivalent, and the call_date is only set when DEBUG_TASK is set instead of inconditionally like with tasks. As such, without this debugging macro, call dates are not updated for tasks woken this way. There isn't any impact yet because this function was introduced in 2.6 to solve certain classes of issues and is not used yet, and in the worst case it would only affect the reported latency time. This may be backported to 2.6 in case a future fix would depend on it but currently will not fix existing code.	2022-09-08 14:19:15 +02:00
Willy Tarreau	f27acd961e	BUG/MINOR: task: always reset a new tasklet's call date The tasklet's call date was not reset, so if profiling was enabled while some tasklets were in the run queue, their initial random value could be used to preload a bogus initial latency value into the task profiling bin. Let's just zero the initial value. This should be backported to 2.4 as it was brought with initial commit b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"). The impact is very low though.	2022-09-08 14:19:15 +02:00
Willy Tarreau	341ac99f4d	BUG/MEDIUM: task: relax one thread consistency check in task_unlink_wq() While testing the fix for the previous issue related to reloads with hard_stop_after, I've met another one which could spuriously produce: FATAL: bug condition "t->tid >= 0 && t->tid != tid" matched at include/haproxy/task.h:266 In 2.3-dev2, we've added more consistency checks for a number of bug- inducing programming errors related to the tasks, via commit e5d79bccc ("MINOR: tasks/debug: add a few BUG_ON() to detect use of wrong timer queue"), and this check comes from there. The problem that happens here is that when hard-stop-after is set, we can abort the current thread even if there are still ongoing checks (or connections in fact). In this case some tasks are present in a thread's wait queue and are thus bound exclusively to this thread. During deinit(), the collect and cleanup of all memory areas also stops servers and kills their check tasks. And calling task_destroy() does in turn call task_unlink_wq()... except that it's called from thread 0 which doesn't match the initially planned thread number. Several approaches are possible. One of them would consist in letting threads perform their own cleanup (tasks, pools, FDs, etc). This would possibly be even faster since done in parallel, but some corner cases might be way more complicated (e.g. who will kill a check's task, or what to do with a task found in a local wait queue or run queue, and what about other consistency checks this could violate?). Thus for now this patches takes an easier and more conservative approach consisting in admitting that when the process is stopping, this rule is not necessarily valid, and to let thread 0 collect all other threads' garbage. As such this patch can be backpoted to 2.4.	2022-08-10 18:03:11 +02:00
Ilya Shipitsin	3b64a28e15	CLEANUP: assorted typo fixes in the code and comments This is 31st iteration of typo fixes	2022-08-06 17:12:51 +02:00
Willy Tarreau	91a7c164b4	MINOR: task: move the niced_tasks counter to the thread group context This one is only used as a hint to improve scheduling latency, so there is no more point in keeping it global since each thread group handles its own run q	2022-07-15 19:43:10 +02:00
Willy Tarreau	b0e7712fb2	MEDIUM: task/thread: move the task shared wait queues per thread group Their migration was postponed for convenience only but now's time for having the shared wait queues per thread group and not just per process, otherwise the WQ lock uses a huge amount of CPU alone.	2022-07-15 19:43:10 +02:00
Willy Tarreau	bdcd32598f	MINOR: thread: only use atomic ops to touch the flags The thread flags are touched a little bit by other threads, e.g. the STUCK flag may be set by other ones, and they're watched a little bit. As such we need to use atomic ops only to manipulate them. Most places were already using them, but here we generalize the practice. Only ha_thread_dump() does not change because it's run under isolation.	2022-07-01 19:15:14 +02:00
Willy Tarreau	319d136ff9	MEDIUM: task: use regular eb32 trees for the run queues Since we don't mix tasks from different threads in the run queues anymore, we don't need to use the eb32sc_ trees and we can switch to the regular eb32 ones. This uses cheaper lookup and insert code, and a 16-thread test on the queues shows a performance increase from 570k RPS to 585k RPS.	2022-07-01 19:15:14 +02:00
Willy Tarreau	c958c70ec8	MINOR: task: replace global_tasks_mask with a check for tree's emptiness This bit field used to be a per-thread cache of the result of the last lookup of the presence of a task for each thread in the shared cache. Since we now know that each thread has its own shared cache, a test of emptiness is now sufficient to decide whether or not the shared tree has a task for the current thread. Let's just remove this mask.	2022-07-01 19:15:14 +02:00
Willy Tarreau	da195e8aab	MINOR: task: remove grq_total and use rq_total instead grq_total was only used to know how many tasks were being queued in the global runqueue for stats purposes, and that was transferred to the per thread rq_total counter once assigned. We don't need this anymore since we know where they are, so let's just directly update rq_total and drop that one.	2022-07-01 19:15:14 +02:00
Willy Tarreau	b17dd6cc19	MEDIUM: task: replace the global rq_lock with a per-rq one There's no point having a global rq_lock now that we have one shared RQ per thread, let's have one lock per runqueue instead.	2022-07-01 19:15:14 +02:00
Willy Tarreau	6f78038d72	MEDIUM: task: move the shared runqueue to one per thread Since we only use the shared runqueue to put tasks only assigned to known threads, let's move that runqueue to each of these threads. The goal will be to arrange an N*(N-1) mesh instead of a central contention point. The global_rqueue_ticks had to be dropped (for good) since we'll now use the per-thread rqueue_ticks counter for both trees. A few points to note: - the rq_lock stlil remains the global one for now so there should not be any gain in doing this, but should this trigger any regression, it is important to detect whether it's related to the lock or to the tree. - there's no more reason for using the scope-based version of the ebtree now, we could switch back to the regular eb32_tree. - it's worth checking if we still need TASK_GLOBAL (probably only to delete a task in one's own shared queue maybe).	2022-07-01 19:15:14 +02:00
Willy Tarreau	3961608f63	CLEANUP: task: remove the unused task_unlink_rq() This function stopped being used before 2.4 because either the task is dequeued by the scheduler itself and it knows where to find it, or it's killed by any thread, and task_kill() must be used for this as only this one is safe. It's difficult to say whether task_unlink_rq() is still safe, but once the lock moves to a thread declared in the task itself, it will be even more difficult to keep it safe. Let's just remove it now before someone reuses it and causes trouble.	2022-07-01 19:15:14 +02:00
Willy Tarreau	eed3911a54	MINOR: task: replace task_set_affinity() with task_set_thread() The latter passes a thread ID instead of a mask, making the code simpler.	2022-07-01 19:15:14 +02:00
Willy Tarreau	159e3acf5d	MEDIUM: task: remove TASK_SHARED_WQ and only use t->tid TASK_SHARED_WQ was set upon task creation and never changed afterwards. Thus if a task was created to run anywhere (e.g. a check or a Lua task), all its timers would always pass through the shared timers queue with a lock. Now we know that tid<0 indicates a shared task, so we can use that to decide whether or not to use the shared queue. The task might be migrated using task_set_affinity() but it's always dequeued first so the check will still be valid. Not only this removes a flag that's difficult to keep synchronized with the thread ID, but it should significantly lower the load on systems with many checks. A quick test with 5000 servers and fast checks that were saturating the CPU shows that the check rate increased by 20% (hence the CPU usage dropped by 17%). It's worth noting that run_task_lists() almost no longer appears in perf top now.	2022-07-01 19:15:14 +02:00
Willy Tarreau	1f4bf7215a	MEDIUM: task: only keep task_new_*() and drop task_new() As previously advertised in comments, the mask-based task_new() is now gone. The low-level function now is task_new_on() which takes a thread number or a negative value for "any thread", which is turned to zero for thread-less builds since there's no shared WQ in thiscase. The task_new_here() and task_new_anywhere() functions were adjusted accordingly.	2022-07-01 19:15:14 +02:00
Willy Tarreau	0ad00befc1	CLEANUP: task: remove thread_mask from the struct task It was not used anymore since everything moved to ->tid, so let's remove it.	2022-07-01 19:15:14 +02:00
Willy Tarreau	29ffe26733	MAJOR: task: use t->tid instead of ffsl(t->thread_mask) to take the thread ID At several places we need to figure the ID of the first thread allowed to run a task. Till now this was performed using my_ffsl(t->thread_mask) but since we now have the thread ID stored into the task, let's use it instead. This is tagged major because it starts to assume that tid<0 is strictly equivalent to atleast2(thread_mask), and that as such, among the allowed threads are the current one.	2022-07-01 19:15:14 +02:00
Willy Tarreau	5b8e054732	MEDIUM: task/debug: move the ->thread_mask integrity checks to ->tid Let's make sure the new ->tid field is always correct instead of checking the thread mask.	2022-07-01 19:15:14 +02:00
Willy Tarreau	6ef52f4479	MEDIUM: task: add and preset a thread ID in the task struct The tasks currently rely on a mask but do not have an assigned thread ID, contrary to tasklets. However, in practice they're either running on a single thread or on any thread, so that it will be worth simplifying all this in order to ease the transition to the thread groups. This patch introduces a "tid" field in the task struct, that's either the number of the thread the task is attached to, or a negative value if the task is not bound to a thread, (i.e. its mask is all_threads_mask). The new ID is only set and updated but not used yet.	2022-07-01 19:15:14 +02:00
Frédéric Lécaille	ad548b54a7	MINOR: task: Add tasklet_wakeup_after() We want to be able to schedule a tasklet onto a thread after the current tasklet is done. What we have to do is to insert this tasklet at the head of the thread task list. Furthermore, we would like to serialize the tasklets. They must be run in the same order as the order in which they have been scheduled. This is implemented passing a list of tasklet as parameter (see <head> parameters) which must be reused for subsequent calls. _tasklet_wakeup_after_on() is implemented to accomplish this job. tasklet_wakeup_after_on() and tasklet_wake_after() are only wrapper macros around _tasklet_wakeup_after_on(). tasklet_wakeup_after_on() does exactly the same thing as _tasklet_wakeup_after_on() without having to pass the filename and line in the filename as parameters (usefull when DEBUG_TASK is enabled). tasklet_wakeup_after() hides also the usage of the thread parameter which is <tl> tasklet thread ID.	2022-06-30 14:24:04 +02:00
Willy Tarreau	3ccb14d60d	MINOR: thread: get rid of MAX_THREADS_MASK This macro was used both for binding and for lookups. When binding tasks or FDs, using all_threads_mask instead is better as it will later be per group. For lookups, ~0UL always does the job. Thus in practice the macro was already almost not used anymore since the rest of the code could run fine with a constant of all ones there.	2022-06-14 11:18:40 +02:00
Willy Tarreau	680ed5f28b	MINOR: task: move profiling bit to per-thread Instead of having a global mask of all the profiled threads, let's have one flag per thread in each thread's flags. They are never accessed more than one at a time an are better located inside the threads' contexts for both performance and scalability.	2022-06-14 10:38:03 +02:00
Christopher Faulet	a45403f965	Revert "BUG/MINOR: task: Don't defer tasks release when HAProxy is stopping" This reverts commit d9404b464faae3340ac1745b594929e4b7edd650. In fact, there is a BUG_ON() in __task_free() function to be sure the task is no longer in the wait-queue or the run-queue. Because the patch tries to fix a "leak" on deinit, it is safer to revert it. there is no reason to introduce potential bug for this kind of issues. And there is no reason to impact the normal use-cases at runtime with additionnal conditions to only remove a task on deinit.	2022-05-25 16:41:52 +02:00
Christopher Faulet	d9404b464f	BUG/MINOR: task: Don't defer tasks release when HAProxy is stopping A running or queued task is not released when task_destroy() is called, except if it is the current task. Its process function is set to NULL and we let the scheduler to release the task. However, when HAProxy is stopping, it never happens and some tasks may leak. To fix the issue, we now also rely on the global MODE_STOPPING flag. When this flag is set, the task is always immediately released. This patch should fix the issue #1714. It could be backported as far as 2.4 but it's not a real problem in practice because it only happens on deinit. The leak exists on previous versions but not MODE_STOPPING flag.	2022-05-25 15:31:21 +02:00
Willy Tarreau	a4e39890f3	MINOR: task: add a new task_instant_wakeup() function This function's purpose is to wake up either a local or remote task, bypassing the tree-based run queue. It is meant for fast wakeups that are supposed to be equivalent to those used with tasklets, i.e. a task had to pause some processing and can complete (typically a resource becomes available again). In all cases, it's important to keep in mind that the task must have gone through the regular scheduling path before being blocked, otherwise the task priorities would be ignored. The reason for this is that some wakeups are massively inter-thread (e.g. server queues), that these inter-thread wakeups cause a huge contention on the shared runqueue lock. A user reported 47% CPU spent in process_runnable_tasks with only 32 threads and 80k requests in queues. With this mechanism, purely one-to-one wakeups can avoid taking the lock thanks to the mt_list used for the shared tasklet queue. Right now the shared tasklet queue moves everything to the TL_URGENT queue. It's not dramatic but it would seem better to have a new shared list dedicated to tasks, and that would deliver into TL_NORMAL, for an even better fairness. This could be improved in the future.	2022-04-22 19:11:59 +02:00
Willy Tarreau	e1efd2a2d7	BUILD: sched: workaround crazy and dangerous warning in Clang 14 Ilya reported in issue #1638 that Clang 14 has invented a new warning that encourages to modify the code in a way that is not always equivalent, by turning "\|" to "\|\|" between some logical operators, except that the first one guarantees that all members of the expression will always be evaluated while the latter will stop at the first one which is true! This warning triggers in thread_has_tasks(), which is not sensitive to such change of behavior but which is built this way because it results in branchless code for something that most often evaluates to false for all terms. As such it was out of question to turn this to less efficient compare-and-jump that needlessly pollute the branch predictor, so the workaround consists in casting each expression to (int). It was verified that the code is the same. Yet another example of how-to-introduce-bugs-by-fixing-valid-code through warnings invented around a beer without thinking longer! This may need to be backported to a few older branches in case this compiler lands in recent distros or if gcc finds it wise to imitate it.	2022-04-14 15:11:12 +02:00

1 2

99 Commits