MT_LIST: multi-thread aware doubly-linked lists Abstract -------- mt_lists are a form of doubly-linked lists that support thread-safe standard list operations such as insert / append / delete / pop, as well as a safe iterator that supports deletion and concurrent use. Principles ---------- The lists are designed to minimize contention in environments where elements may be concurrently manipulated at different locations. The principle is to act on the links between the elements instead of the elements themselves. This is achieved by temporarily "cutting" these links, which effectively consists in replacing the ends of the links with special pointers serving as a lock, called MT_LIST_BUSY. An element is considered locked when both its next and prev pointers are equal to this MT_LIST_BUSY pointer. A link is locked when both of its ends are equal to this MT_LIST_BUSY pointer, i.e. the next pointer of the element at the source of the link and the prev pointer of the element the link points to. It's worth noting that a locked link by definition no longer exists since neither end knows where it was pointing to, unless a backup of it was made prior to locking it. The next and prev pointers are replaced by the list manipulation functions using atomic exchange. This means that the caller knows if the element it tries to replace was already locked or if it owns it. In order to replace a link, both ends of the link must be owned by the thread willing to replace it. Similarly when adding or removing an element, both ends of the elements must be owned by the thread trying to manipulate the element. Appending or inserting elements comes in two flavors: the standard one which considers that the element is already owned by the thread and ignores its contents; this is the most common usage for a link that was just allocated or extracted from a list. The second flavor doesn't trust the thread's ownership of the element and tries to own it prior to adding the element; this may be used when this element is a shared one that needs to be placed into a list. Removing an element always consists in owning the two links surrounding it, hence owning the 4 pointers. Scanning the list consists in locking the element to (re)start from, locking the link used to jump to the next element, then locking that element and unlocking the previous one. All types of concurrency issues are supported there, including elements disappearing while trying to lock them. It is perfectly possible to have multiple threads scan the same list at the same time, and it's usually efficient. However, if those threads face a single contention point (e.g. pause on a locked element), they may then restart working from the same point all at the same time and compete for the same links and elements for each step, which will become less efficient. However, it does work fine. There's currently no support for shared locking (e.g. rwlocks), elements and links are always exclusively locked. Since locks are attempted in a sequence, this creates a nested lock pattern which could theoretically cause deadlocks if adjacent elements were locked in parallel. This situation is handled using a rollback mechanism: if any thread fails to lock any element or pointer, it detects the conflict with another thread and entirely rolls back its operations in order to let the other thread complete. This rollback is what aims at guaranteeing forward progress. There is, however, a non-null risk that both threads spend their time rolling back and trying again. This is covered using exponential back-off that may grow to large enough values to let a thread lock all the pointer it needs to complete an operation. Other mechanisms could be implemented in the future such as rotating priorities or random lock numbers to let both threads know which one must roll back and which one may continue. Due to certain operations applying to the type of an element (iterator, element retrieval), some parts do require macros. In order to avoid keeping too confusing an API, all operations are made accessible via macros. However, in order to ease maintenance and improve error reporting when facing unexpected arguments, all the code parts that were compatible have been implemented as inlinable functions instead. And in order to help with performance profiling, it is possible to prevent the compiler from inlining all the functions that may loop. As a rule of thumb, operations which only exist as macros do modify one or more of their arguments. All exposed functions are called "mt_list_something()", all exposed macros are called "MT_LIST_SOMETHING()", possibly mapping 1-to-1 to the equivalent function, and the list element type is called "mt_list". Operations ---------- mt_list_append(el1, el2) Adds el2 before el1, which means that if el1 is the list's head, el2 will effectively be appended to the end of the list. before: +---+ |el2| +---+ V +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=# # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # #=====================================================================# mt_list_try_append(el1, el2) Tries to add el2 before el1, which means that if el1 is the list's head, el2 will effectively be appended to the end of the list. el2 will only be added if it's deleted (loops over itself). The operation will return zero if this is not the case (el2 is not empty anymore) or non-zero on success. before: #=========# # +---+ # #=>|el2|<=# +---+ V +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=# # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # #=====================================================================# mt_list_insert(el1, el2) Adds el2 after el1, which means that if el1 is the list's head, el2 will effectively be insert at the beginning of the list. before: +---+ |el2| +---+ V +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # #=====================================================================# mt_list_try_insert(el1, el2) Tries to add el2 after el1, which means that if el1 is the list's head, el2 will effectively be inserted at the beginning of the list. el2 will only be added if it's deleted (loops over itself). The operation will return zero if this is not the case (el2 is not empty anymore) or non-zero on success. before: #=========# # +---+ # #=>|el2|<=# +---+ V +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # #=====================================================================# mt_list_delete(el1) Removes el1 from the list, and marks it as deleted, wherever it is. If the element was already not part of a list anymore, 0 is returned, otherwise non-zero is returned if the operation could be performed. before: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # #=====================================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# +---+ #=>|el1|<=# # +---+ # #=========# mt_list_behead(l) Detaches a list of elements from its head with the aim of reusing them to do anything else. The head will be turned to an empty list, and the list will be partially looped: the first element's prev will point to the last one, and the last element's next will be NULL. The pointer to the first element is returned, or NULL if the list was empty. This is essentially used when recycling lists of unused elements, or to grab a lot of elements at once for local processing. It is safe to be run concurrently with the insert/append operations performed at the list's head, but not against modifications performed at any other place, such as delete operation. before: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # #=====================================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>| L |<=# ,--| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<-. # +---+ # | +---+ +---+ +---+ +---+ +---+ +---+ | #=========# `-----------------------------------------------------------' mt_list_pop(l) Removes the list's first element, returns it deleted. If the list was empty, NULL is returned. When combined with mt_list_append() this can be used to implement MPMC queues for example. A macro MT_LIST_POP() is provided for a more convenient use; instead of returning the list element, it will return the structure holding the element, taking care of preserving the NULL. before: +---+ +---+ +---+ +---+ +---+ +---+ +---+ #=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ +---+ # #=====================================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| L |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# +---+ #=>| A |<=# # +---+ # #=========# _mt_list_lock_next(elt) Locks the link that starts at the next pointer of the designated element. The link is replaced by two locked pointers, and a pointer to the next element is returned. The link must then be unlocked using _mt_list_unlock_next() passing it this pointer, or mt_list_unlock_link(). This function is not intended to be used by applications, and makes certain assumptions about the state of the list pertaining to its use in iterators. before: +---+ +---+ +---+ +---+ +---+ +---+ #=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# Return value: &B _mt_list_unlock_next(elt, back) Unlocks the link that starts at the next pointer of the designated element and is supposed to end at . This function is not intended to be used by applications, and makes certain assumptions about the state of the list pertaining to its use in iterators. before: back \ +---+ +---+ +---+ +---+ +---+ +---+ #=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# _mt_list_lock_prev(elt) Locks the link that starts at the prev pointer of the designated element. The link is replaced by two locked pointers, and a pointer to the prev element is returned. The link must then be unlocked using _mt_list_unlock_prev() passing it this pointer, or mt_list_unlock_link(). This function is not intended to be used by applications, and makes certain assumptions about the state of the list pertaining to its use in iterators. before: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# Return value: &A _mt_list_unlock_prev(elt, back) Unlocks the link that starts at the prev pointer of the designated element and is supposed to end at . This function is not intended to be used by applications, and makes certain assumptions about the state of the list pertaining to its use in iterators. before: back / +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# mt_list_lock_next(elt) Cuts the list after the specified element. The link is replaced by two locked pointers, and is returned as a list element. The list must then be unlocked using mt_list_unlock_link() or mt_list_unlock_full() applied to the returned list element. before: +---+ +---+ +---+ +---+ +---+ +---+ #=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# Return elt B value: <===> mt_list_lock_prev(elt) Cuts the list before the specified element. The link is replaced by two locked pointers, and is returned as a list element. The list must then be unlocked using mt_list_unlock_link() or mt_list_unlock_full() applied to the returned list element. before: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# Return A elt value: <===> mt_list_lock_elem(elt) Locks the element only. Both of its pointers are replaced by two locked pointers, and the previous ones are returned as a list element. It's not possible to remove such an element from a list since neighbors are not locked. The sole purpose of this operation is to prevent another thread from visiting this element during an operation. The element must then be unlocked using mt_list_unlock_elem() applied to the returned element. before: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |=> x|elt|x <=| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# Return A C value: <===> mt_list_unlock_elem(elt, ends) Unlocks the element only by restoring its backed up contents from , as returned by a previous call to mt_list_lock_elem(elt). The ends of the links are not affected, only the element is touched. This is intended to terminate a critical section started by a call to mt_list_lock_elem(). It may also be used on a fully locked element processed by mt_list_lock_full() in which case it will leave the list still locked. before: A C ends: <===> +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |=> x|elt|x <=| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# before: A C ends: <===> +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |x <=|elt|=> x| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# mt_list_unlock_self(elt) Unlocks the element only by resetting it (i.e. making it loop over itself). This is useful in the locked variant of iterators when the element is to be removed from the list and first needs to be unlocked because it's shared with other operations (such as a concurrent attempt to delete it from a list), or simply in case it is to be recycled in a usable state. The ends of the links are not affected, only the element is touched. This is normally only used from within locked iterators, which perform a full lock (both links are locked). before: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>|elt|<=# #=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=# # +---+ # # +---+ +---+ +---+ +---+ +---+ # #=========# #=================================================# mt_list_lock_full(elt) Locks both the element and its surrounding links. The extremities of the previous links are returned as a single list element (which corresponds to the element's before locking). The list must then be unlocked using mt_list_unlock_full() to reconnect the element to the list and unlock both, or mt_list_unlock_link() to effectively remove the element. before: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# Return A C value: <=============> mt_list_unlock_link(ends) Connects two ends in a list together, effectively unlocking the list if it was locked. It takes a list head which contains a pointer to the prev and next elements to connect together. It normally is a copy of a previous link returned by functions such as mt_list_lock_next(), mt_list_lock_prev(), or mt_list_lock_full(). If applied after mt_list_lock_full(), it will result in the list being reconnected without the element, which remains locked, effectively deleting it. Note that this is not meant to be used from within iterators, as the iterator will automatically and safely reconnect ends after each iteration. before: A C Ends: <===> +---+ +---+ +---+ +---+ +---+ #=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ # #=================================================# after: +---+ +---+ +---+ +---+ +---+ #=>| A |<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ # #=================================================# mt_list_unlock_full(elt, ends) Connects the specified element to the elements pointed to by the specified , which is a backup copy of the previous list member of the element prior to locking it using mt_list_lock_full() or mt_list_lock_elem(). This is normally used to unlock an element and a list, but may also be used to manually insert an element into an opened list (which should still be locked). The element's list member is technically assigned a copy of and both sides point to the element. This must not be used inside an iterator as it would also unlock the list itself and make the loop visit nodes in an unknown state. before: +---+ elt: x|elt|x +---+ A C ends: <=============> +---+ +---+ +---+ +---+ +---+ #=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ # #===========================================================# after: +---+ +---+ +---+ +---+ +---+ +---+ #=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=# # +---+ +---+ +---+ +---+ +---+ +---+ # #===========================================================# MT_LIST_FOR_EACH_ENTRY_LOCKED(item, list_head, member, back) Iterates through a list of items of type "typeof(*item)" which are linked via a "struct mt_list" member named . A pointer to the head of the list is passed in . is a temporary struct mt_list, used internally. It contains a copy of the contents of the current item's list member before locking it. This macro is implemented using two nested loops, each defined as a separate macro for easier inspection. The inner loop will run for each element in the list, and the outer loop will run only once to do some cleanup and unlocking when the end of the list is reached or user breaks from inner loop. It is safe to break from this macro as the cleanup will be performed anyway, but it is strictly forbidden to branch (goto or return) from the loop because skipping the cleanup will lead to undefined behavior. During the scan of the list, the item has both of its links locked, so concurrent operations on the list are safe. However the thread holding the list locked must be careful not to perform other locking operations. In order to remove the current element, setting to NULL is sufficient to make the inner loop not try to re-attach it. It is recommended to reinitialize it though if it is expected to be reused, so as not to leave its pointers locked. Same if other threads are trying to concurrently operate on the element. From within the loop, the list looks like this: MT_LIST_FOR_EACH_ENTRY_LOCKED(item, lh, list, back) { // A C // back: <=============> // item->list // +---+ +---+ +-V-+ +---+ +---+ +---+ // #=>|lh |<===>| A |x x| |x x| C |<===>| D |<===>| E |<=# // # +---+ +---+ +---+ +---+ +---+ +---+ # // #===========================================================# } This means that only the current item as well as its two neighbors are locked. It is thus possible to act on any other part of the list in parallel (other threads might have begun slightly earlier). However if a thread is too slow to proceed, other threads may quickly reach its position, and all of them will then wait on the same element, slowing down the progress. MT_LIST_FOR_EACH_ENTRY_UNLOCKED(item, list_head, member, back) Iterates through a list of items of type "typeof(*item)" which are linked via a "struct mt_list" member named . A pointer to the head of the list is passed in . is a temporary struct mt_list, used internally. It contains a copy of the contents of the current item's list member before resetting it. This macro is implemented using two nested loops, each defined as a separate macro for easier inspection. The inner loop will run for each element in the list, and the outer loop will run only once to do some cleanup and unlocking when the end of the list is reached or user breaks from inner loop. It is safe to break from this macro as the cleanup will be performed anyway, but it is strictly forbidden to branch (goto or return) from the loop because skipping the cleanup will lead to undefined behavior. During the scan of the list, the item has both of its neighbours locked, with both of its ends pointing to itself. Thus, concurrent walks on the list are safe, but not direct accesses to the element. In order to remove the current element, setting to NULL is sufficient to make the inner loop not try to re-attach it. There is no need to reinitialize it since it is already done. If the element is left, it will be re-attached to the list. This version is meant as a more user-friendly method to walk over a list in which it is known by design that elements are not directly accessed (e.g. a pure MPMC queue). The typical pattern which corresponds to this case is when the first operation in the iterator's body is a call to unlock the iterator, which is then no longer needed (though harmless). From within the loop, the list looks like this: MT_LIST_FOR_EACH_ENTRY_UNLOCKED(item, lh, list, back) { // back: A C // item->list <===> // +-V-+ +---+ +---+ +---+ +---+ +---+ // #>| |<# #=>|lh |<===>| A |x x| C |<===>| D |<===>| E |<=# // # +---+ # # +---+ +---+ +---+ +---+ +---+ # // #=======# #=================================================# } This means that only the current item's neighbors are locked. It is thus possible to act on any other part of the list in parallel (other threads might have begun slightly earlier) but not on the element. However if a thread is too slow to proceed, other threads may quickly reach its position, and all of them will then wait on the same element, slowing down the progress. Examples -------- The example below collects up to 50 jobs from a shared list that are compatible with the current thread, and moves them to a local list for later processing. The same pointers are used for both lists and placed in an anonymous union. struct job { union { struct list list; struct mt_list mt_list; }; unsigned long thread_mask; /* 1 bit per eligible thread */ /* struct-specific stuff below */ ... }; extern struct mt_list global_job_queue; extern struct list local_job_queue; struct mt_list back; struct job *item; int budget = 50; /* collect up to 50 shared items */ MT_LIST_FOR_EACH_ENTRY_LOCKED(item, &global_job_queue, mt_list, back) { if (!(item->thread_mask & current_thread_bit)) continue; /* job not eligible for this thread */ LIST_APPEND(&local_job_queue, &item->list); item = NULL; if (!--budget) break; } /* process extracted items */ LIST_FOR_EACH(item, &local_job_queue, list) { ... }