mirror of
				https://git.haproxy.org/git/haproxy.git/
				synced 2025-10-31 08:30:59 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			229 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			229 lines
		
	
	
		
			11 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 2021-11-17 - Scheduler API
 | |
| 
 | |
| 
 | |
| 1. Background
 | |
| -------------
 | |
| 
 | |
| The scheduler relies on two major parts:
 | |
|   - the wait queue or timers queue, which contains an ordered tree of the next
 | |
|     timers to expire
 | |
| 
 | |
|   - the run queue, which contains tasks that were already woken up and are
 | |
|     waiting for a CPU slot to execute.
 | |
| 
 | |
| There are two types of schedulable objects in HAProxy:
 | |
|   - tasks: they contain one timer and can be in the run queue without leaving
 | |
|     their place in the timers queue.
 | |
| 
 | |
|   - tasklets: they do not have the timers part and are either sleeping or
 | |
|     running.
 | |
| 
 | |
| Both the timers queue and run queue in fact exist both shared between all
 | |
| threads and per-thread. A task or tasklet may only be queued in a single of
 | |
| each at a time. The thread-local queues are not thread-safe while the shared
 | |
| ones are. This means that it is only permitted to manipulate an object which
 | |
| is in the local queue or in a shared queue, but then after locking it. As such
 | |
| tasks and tasklets are usually pinned to threads and do not move, or only in
 | |
| very specific ways not detailed here.
 | |
| 
 | |
| In case of doubt, keep in mind that it's not permitted to manipulate another
 | |
| thread's private task or tasklet, and that any task held by another thread
 | |
| might vanish while it's being looked at.
 | |
| 
 | |
| Internally a large part of the task and tasklet struct is shared between
 | |
| the two types, which reduces code duplication and eases the preservation
 | |
| of fairness in the run queue by interleaving all of them. As such, some
 | |
| fields or flags may not always be relevant to tasklets and may be ignored.
 | |
| 
 | |
| 
 | |
| Tasklets do not use a thread mask but use a thread ID instead, to which they
 | |
| are bound. If the thread ID is negative, the tasklet is not bound but may only
 | |
| be run on the calling thread.
 | |
| 
 | |
| 
 | |
| 2. API
 | |
| ------
 | |
| 
 | |
| There are few functions exposed by the scheduler. A few more ones are in fact
 | |
| accessible but if not documented there they'd rather be avoided or used only
 | |
| when absolutely certain they're suitable, as some have delicate corner cases.
 | |
| In doubt, checking the sched.pdf diagram may help.
 | |
| 
 | |
| int total_run_queues()
 | |
|         Return the approximate number of tasks in run queues. This is racy
 | |
|         and a bit inaccurate as it iterates over all queues, but it is
 | |
|         sufficient for stats reporting.
 | |
| 
 | |
| int task_in_rq(t)
 | |
|         Return non-zero if the designated task is in the run queue (i.e. it was
 | |
|         already woken up).
 | |
| 
 | |
| int task_in_wq(t)
 | |
|         Return non-zero if the designated task is in the timers queue (i.e. it
 | |
|         has a valid timeout and will eventually expire).
 | |
| 
 | |
| int thread_has_tasks()
 | |
|         Return non-zero if the current thread has some work to be done in the
 | |
|         run queue. This is used to decide whether or not to sleep in poll().
 | |
| 
 | |
| void task_wakeup(t, f)
 | |
|         Will make sure task <t> will wake up, that is, will execute at least
 | |
|         once after the start of the function is called. The task flags <f> will
 | |
|         be ORed on the task's state, among TASK_WOKEN_* flags exclusively. In
 | |
|         multi-threaded environments it is safe to wake up another thread's task
 | |
|         and even if the thread is sleeping it will be woken up. Users have to
 | |
|         keep in mind that a task running on another thread might very well
 | |
|         finish and go back to sleep before the function returns. It is
 | |
|         permitted to wake the current task up, in which case it will be
 | |
|         scheduled to run another time after it returns to the scheduler.
 | |
| 
 | |
| struct task *task_unlink_wq(t)
 | |
|         Remove the task from the timers queue if it was in it, and return it.
 | |
|         It may only be done for the local thread, or for a shared thread that
 | |
|         might be in the shared queue. It must not be done for another thread's
 | |
|         task.
 | |
| 
 | |
| void task_queue(t)
 | |
|         Place or update task <t> into the timers queue, where it may already
 | |
|         be, scheduling it for an expiration at date t->expire. If t->expire is
 | |
|         infinite, nothing is done, so it's safe to call this function without
 | |
|         prior checking the expiration date. It is only valid to call this
 | |
|         function for local tasks or for shared tasks who have the calling
 | |
|         thread in their thread mask.
 | |
| 
 | |
| void task_set_thread(t, id)
 | |
|         Change task <t>'s thread ID to new value <id>. This may only be
 | |
|         performed by the task itself while running. This is only used to let a
 | |
|         task voluntarily migrate to another thread. Thread id -1 is used to
 | |
|         indicate "any thread". It's ignored and replaced by zero when threads
 | |
|         are disabled.
 | |
| 
 | |
| void tasklet_wakeup(tl)
 | |
|         Make sure that tasklet <tl> will wake up, that is, will execute at
 | |
|         least once. The tasklet will run on its assigned thread, or on any
 | |
|         thread if its TID is negative.
 | |
| 
 | |
| void tasklet_wakeup_on(tl, thr)
 | |
|         Make sure that tasklet <tl> will wake up on thread <thr>, that is, will
 | |
|         execute at least once. The designated thread may only differ from the
 | |
|         calling one if the tasklet is already configured to run on another
 | |
|         thread, and it is not permitted to self-assign a tasklet if its tid is
 | |
|         negative, as it may already be scheduled to run somewhere else. Just in
 | |
|         case, only use tasklet_wakeup() which will pick the tasklet's assigned
 | |
|         thread ID.
 | |
| 
 | |
| struct tasklet *tasklet_new()
 | |
|         Allocate a new tasklet and set it to run by default on the calling
 | |
|         thread. The caller may change its tid to another one before using it.
 | |
|         The new tasklet is returned.
 | |
| 
 | |
| struct task *task_new_anywhere()
 | |
|         Allocate a new task to run on any thread, and return the task, or NULL
 | |
|         in case of allocation issue. Note that such tasks will be marked as
 | |
|         shared and will go through the locked queues, thus their activity will
 | |
|         be heavier than for other ones. See also task_new_here().
 | |
| 
 | |
| struct task *task_new_here()
 | |
|         Allocate a new task to run on the calling thread, and return the task,
 | |
|         or NULL in case of allocation issue.
 | |
| 
 | |
| struct task *task_new_on(t)
 | |
|         Allocate a new task to run on thread <t>, and return the task, or NULL
 | |
|         in case of allocation issue.
 | |
| 
 | |
| void task_destroy(t)
 | |
|         Destroy this task. The task will be unlinked from any timers queue,
 | |
|         and either immediately freed, or asynchronously killed if currently
 | |
|         running. This may only be done by one of the threads this task is
 | |
|         allowed to run on. Developers must not forget that the task's memory
 | |
|         area is not always immediately freed, and that certain misuses could
 | |
|         only have effect later down the chain (e.g. use-after-free).
 | |
| 
 | |
| void tasklet_free()
 | |
|         Free this tasklet, which must not be running, so that may only be
 | |
|         called by the thread responsible for the tasklet, typically the
 | |
|         tasklet's process() function itself.
 | |
| 
 | |
| void task_schedule(t, d)
 | |
|         Schedule task <t> to run no later than date <d>. If the task is already
 | |
|         running, or scheduled for an earlier instant, nothing is done. If the
 | |
|         task was not in queued or was scheduled to run later, its timer entry
 | |
|         will be updated. This function assumes that it will never be called
 | |
|         with a timer in the past nor with TICK_ETERNITY. Only one of the
 | |
|         threads assigned to the task may call this function.
 | |
| 
 | |
| The task's ->process() function receives the following arguments:
 | |
| 
 | |
|   - struct task *t: a pointer to the task itself. It is always valid.
 | |
| 
 | |
|   - void *ctx     : a copy of the task's ->context pointer at the moment
 | |
|                     the ->process() function was called by the scheduler. A
 | |
|                     function must use this and not task->context, because
 | |
|                     task->context might possibly be changed by another thread.
 | |
|                     For instance, the muxes' takeover() function do this.
 | |
| 
 | |
|   - uint state    : a copy of the task's ->state field at the moment the
 | |
|                     ->process() function was executed. A function must use
 | |
|                     this and not task->state as the latter misses the wakeup
 | |
|                     reasons and may constantly change during execution along
 | |
|                     concurrent wakeups (threads or signals).
 | |
| 
 | |
| The possible state flags to use during a call to task_wakeup() or seen by the
 | |
| task being called are the following; they're automatically cleaned from the
 | |
| state field before the call to ->process()
 | |
| 
 | |
|   - TASK_WOKEN_INIT    each creation of a task causes a first wakeup with this
 | |
|                        flag set. Applications should not set it themselves.
 | |
| 
 | |
|   - TASK_WOKEN_TIMER   this indicates the task's expire date was reached in the
 | |
|                        timers queue. Applications should not set it themselves.
 | |
| 
 | |
|   - TASK_WOKEN_IO      indicates the wake-up happened due to I/O activity. Now
 | |
|                        that all low-level I/O processing happens on tasklets,
 | |
|                        this notion of I/O is now application-defined (for
 | |
|                        example stream-interfaces use it to notify the stream).
 | |
| 
 | |
|   - TASK_WOKEN_SIGNAL  indicates that a signal the task was subscribed to was
 | |
|                        received. Applications should not set it themselves.
 | |
| 
 | |
|   - TASK_WOKEN_MSG     any application-defined wake-up reason, usually for
 | |
|                        inter-task communication (e.g filters vs streams).
 | |
| 
 | |
|   - TASK_WOKEN_RES     a resource the task was waiting for was finally made
 | |
|                        available, allowing the task to continue its work. This
 | |
|                        is essentially used by buffers and queues. Applications
 | |
|                        may carefully use it for their own purpose if they're
 | |
|                        certain not to rely on existing ones.
 | |
| 
 | |
|   - TASK_WOKEN_OTHER   any other application-defined wake-up reason.
 | |
| 
 | |
| 
 | |
| In addition, a few persistent flags may be observed or manipulated by the
 | |
| application, both for tasks and tasklets:
 | |
| 
 | |
|   - TASK_SELF_WAKING   when set, indicates that this task was found waking
 | |
|                        itself up, and its class will change to bulk processing.
 | |
|                        If this behavior is under control temporarily expected,
 | |
|                        and it is not expected to happen again, it may make
 | |
|                        sense to reset this flag from the ->process() function
 | |
|                        itself.
 | |
| 
 | |
|   - TASK_HEAVY         when set, indicates that this task does so heavy
 | |
|                        processing that it will become mandatory to give back
 | |
|                        control to I/Os otherwise big latencies might occur. It
 | |
|                        may be set by an application that expects something
 | |
|                        heavy to happen (tens to hundreds of microseconds), and
 | |
|                        reset once finished. An example of user is the TLS stack
 | |
|                        which sets it when an imminent crypto operation is
 | |
|                        expected.
 | |
| 
 | |
|   - TASK_F_USR1        This is the first application-defined persistent flag.
 | |
|                        It is always zero unless the application changes it. An
 | |
|                        example of use cases is the I/O handler for backend
 | |
|                        connections, to mention whether the connection is safe
 | |
|                        to use or might have recently been migrated.
 | |
| 
 | |
| Finally, when built with -DDEBUG_TASK, an extra sub-structure "debug" is added
 | |
| to both tasks and tasklets to note the code locations of the last two calls to
 | |
| task_wakeup() and tasklet_wakeup().
 |