haproxy/doc/internals/api/mt_list.txt
Willy Tarreau c618ed5ff4 MAJOR: import: update mt_list to support exponential back-off
The new mt_list code supports exponential back-off on conflict, which
is important for use cases where there is contention on a large number
of threads. The API evolved a little bit and required some updates:

  - mt_list_for_each_entry_safe() is now in upper case to explicitly
    show that it is a macro, and only uses the back element, doesn't
    require a secondary pointer for deletes anymore.

  - MT_LIST_DELETE_SAFE() doesn't exist anymore, instead one just has
    to set the list iterator to NULL so that it is not re-inserted
    into the list and the list is spliced there. One must be careful
    because it was usually performed before freeing the element. Now
    instead the element must be nulled before the continue/break.

  - MT_LIST_LOCK_ELT() and MT_LIST_UNLOCK_ELT() have always been
    unclear. They were replaced by mt_list_cut_around() and
    mt_list_connect_elem() which more explicitly detach the element
    and reconnect it into the list.

  - MT_LIST_APPEND_LOCKED() was only in haproxy so it was left as-is
    in list.h. It may however possibly benefit from being upstreamed.

This required tiny adaptations to event_hdl.c and quic_sock.c. The
test case was updated and the API doc added. Note that in order to
keep include files small, the struct mt_list definition remains in
list-t.h (par of the internal API) and was ifdef'd out in mt_list.h.

A test on QUIC with both quictls 1.1.1 and wolfssl 5.6.3 on ARM64 with
80 threads shows a drastic reduction of CPU usage thanks to this and
the refined memory barriers. Please note that the CPU usage on OpenSSL
3.0.9 is significantly higher due to the excessive use of atomic ops
by openssl, but 3.1 is only slightly above 1.1.1 though:

  - before: 35 Gbps, 3.5 Mpps, 7800% CPU
  - after:  41 Gbps, 4.2 Mpps, 2900% CPU
2023-09-13 11:50:33 +02:00

436 lines
20 KiB
Plaintext

MT_LIST: multi-thread aware doubly-linked lists
Abstract
--------
mt_lists are a form of doubly-linked lists that support thread-safe standard
list operations such as insert / append / delete / pop, as well as a safe
iterator that supports deletion and concurrent use.
Principles
----------
The lists are designed to minimize contention in environments where elements
may be concurrently manipulated at different locations. The principle is to
act on the links between the elements instead of the elements themselves. This
is achieved by temporarily "cutting" these links, which effectively consists in
replacing the ends of the links with special pointers serving as a lock, called
MT_LIST_BUSY. An element is considered locked when either its next or prev
pointer is equal to this MT_LIST_BUSY pointer (or both).
The next and prev pointers are replaced by the list manipulation functions
using atomic exchange. This means that the caller knows if the element it tries
to replace was already locked or if it owns it. In order to replace a link,
both ends of the link must be owned by the thread willing to replace it.
Similarly when adding or removing an element, both ends of the elements must be
owned by the thread trying to manipulate the element.
Appending or inserting elements comes in two flavors: the standard one which
considers that the element is already owned by the thread and ignores its
contents; this is the most common usage for a link that was just allocated or
extracted from a list. The second flavor doesn't trust the thread's ownership
of the element and tries to own it prior to adding the element; this may be
used when this element is a shared one that needs to be placed into a list.
Removing an element always consists in owning the two links surrounding it,
hence owning the 4 pointers.
Scanning the list consists in locking the element to (re)start from, locking
the link used to jump to the next element, then locking that element and
unlocking the previous one. All types of concurrency issues are supported
there, including elements disappearing while trying to lock them. It is
perfectly possible to have multiple threads scan the same list at the same
time, and it's usually efficient. However, if those threads face a single
contention point (e.g. pause on a locked element), they may then restart
working from the same point all at the same time and compete for the same links
and elements for each step, which will become less efficient. However, it does
work fine.
There's currently no support for shared locking (e.g. rwlocks), elements and
links are always exclusively locked. Since locks are attempted in a sequence,
this creates a nested lock pattern which could theoretically cause deadlocks
if adjacent elements were locked in parallel. This situation is handled using
a rollback mechanism: if any thread fails to lock any element or pointer, it
detects the conflict with another thread and entirely rolls back its operations
in order to let the other thread complete. This rollback is what aims at
guaranteeing forward progress. There is, however, a non-null risk that both
threads spend their time rolling back and trying again. This is covered using
exponential back-off that may grow to large enough values to let a thread lock
all the pointer it needs to complete an operation. Other mechanisms could be
implemented in the future such as rotating priorities or random lock numbers
to let both threads know which one must roll back and which one may continue.
Due to certain operations applying to the type of an element (iterator, element
retrieval), some parts do require macros. In order to avoid keeping too
confusing an API, all operations are made accessible via macros. However, in
order to ease maintenance and improve error reporting when facing unexpected
arguments, all the code parts that were compatible have been implemented as
inlinable functions instead. And in order to help with performance profiling,
it is possible to prevent the compiler from inlining all the functions that
may loop. As a rule of thumb, operations which only exist as macros do modify
one or more of their arguments.
All exposed functions are called "mt_list_something()", all exposed macros are
called "MT_LIST_SOMETHING()", possibly mapping 1-to-1 to the equivalent
function, and the list element type is called "mt_list".
Operations
----------
mt_list_append(el1, el2)
Adds el2 before el1, which means that if el1 is the list's head, el2 will
effectively be appended to the end of the list.
before:
+---+
|el2|
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_try_append(el1, el2)
Tries to add el2 before el1, which means that if el1 is the list's head,
el2 will effectively be appended to the end of the list. el2 will only be
added if it's deleted (loops over itself). The operation will return zero if
this is not the case (el2 is not empty anymore) or non-zero on success.
before:
#=========#
# +---+ #
#=>|el2|<=#
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<===>|el2|<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_insert(el1, el2)
Adds el2 after el1, which means that if el1 is the list's head, el2 will
effectively be insert at the beginning of the list.
before:
+---+
|el2|
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_try_insert(el1, el2)
Tries to add el2 after el1, which means that if el1 is the list's head,
el2 will effectively be inserted at the beginning of the list. el2 will only
be added if it's deleted (loops over itself). The operation will return zero
if this is not the case (el2 is not empty anymore) or non-zero on success.
before:
#=========#
# +---+ #
#=>|el2|<=#
+---+
V
+---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>|el1|<===>|el2|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
mt_list_delete(el1)
Removes el1 from the list, and marks it as deleted, wherever it is. If
the element was already not part of a list anymore, 0 is returned,
otherwise non-zero is returned if the operation could be performed.
before:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|el1|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
+---+
#=>|el1|<=#
# +---+ #
#=========#
mt_list_behead(l)
Detaches a list of elements from its head with the aim of reusing them to
do anything else. The head will be turned to an empty list, and the list
will be partially looped: the first element's prev will point to the last
one, and the last element's next will be NULL. The pointer to the first
element is returned, or NULL if the list was empty. This is essentially
used when recycling lists of unused elements, or to grab a lot of elements
at once for local processing. It is safe to be run concurrently with the
insert/append operations performed at the list's head, but not against
modifications performed at any other place, such as delete operation.
before:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<=# ,--| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<-.
# +---+ # | +---+ +---+ +---+ +---+ +---+ +---+ |
#=========# `-----------------------------------------------------------'
mt_list_pop(l)
Removes the list's first element, returns it deleted. If the list was empty,
NULL is returned. When combined with mt_list_append() this can be used to
implement MPMC queues for example. A macro MT_LIST_POP() is provided for a
more convenient use; instead of returning the list element, it will return
the structure holding the element, taking care of preserving the NULL.
before:
+---+ +---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<===>| A |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ +---+ #
#=====================================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| L |<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
+---+
#=>| A |<=#
# +---+ #
#=========#
mt_list_cut_after(elt)
Cuts the list after the specified element. The link is replaced by two
locked pointers, and is returned as a list element. The list must then
be unlocked using mt_list_reconnect() or mt_list_connect_elem() applied
to the returned list element.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|<===>| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>|elt|x x| B |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return elt B
value: <===>
mt_list_cut_before(elt)
Cuts the list before the specified element. The link is replaced by two
locked pointers, and is returned as a list element. The list must then
be unlocked using mt_list_reconnect() or mt_list_connect_elem() applied
to the returned list element.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return A elt
value: <===>
mt_list_cut_around(elt)
Cuts the list both before and after the specified element. Both the list
and the element's pointers are locked. The extremities of the previous
links are returned as a single list element (which corresponds to the
element's before locking). The list must then be unlocked using
mt_list_connect_elem() to reconnect the element to the list and unlock
both, or mt_list_reconnect() to effectively remove the element.
before:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |x x|elt|x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
Return A C
value: <=============>
mt_list_reconnect(ends)
Connect both ends of the specified locked list as if they were linked
together, and unlocks the list. This can complete an element removal
operation that was started using mt_list_cut_around(), or can restore a
list that was temporarily locked by mt_list_cut_{after,before}().
before:
A C
Ends: <===>
+---+ +---+ +---+ +---+ +---+
#=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ #
#=================================================#
after:
+---+ +---+ +---+ +---+ +---+
#=>| A |<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ #
#=================================================#
mt_list_connect_elem(elt,ends)
Connects the specified element to the elements pointed to by the specified
ends. This can be used to insert an element into a previously locked and
cut list, or to restore a list as it was before mt_list_cut_around(elt).
The element's list part is effectively replaced by the contents of the
ends.
before:
+---+
elt: x|elt|x
+---+
A C
ends: <=============>
+---+ +---+ +---+ +---+ +---+
#=>| A |x x| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
after:
+---+ +---+ +---+ +---+ +---+ +---+
#=>| A |<===>|elt|<===>| C |<===>| D |<===>| E |<===>| F |<=#
# +---+ +---+ +---+ +---+ +---+ +---+ #
#===========================================================#
MT_LIST_FOR_EACH_ENTRY_SAFE(item, list_head, member, back)
Iterates <item> through a list of items of type "typeof(*item)" which are
linked via a "struct mt_list" member named <member>. A pointer to the head
of the list is passed in <list_head>. <back> is a temporary struct mt_list,
used internally. It contains a copy of the contents of the current item's
list member before locking it. This macro is implemented using two nested
loops, each defined as a separate macro for easier inspection. The inner
loop will run for each element in the list, and the outer loop will run
only once to do some cleanup and unlocking when the end of the list is
reached or user breaks from inner loop. It is safe to break from this macro
as the cleanup will be performed anyway, but it is strictly forbidden to
branch (goto or return) from the loop because skipping the cleanup will
lead to undefined behavior. During the scan of the list, the item is locked
thus disconnected and the list locked around it, so concurrent operations
on the list are safe. However the thread holding the list locked must be
careful not to perform other locking operations. In order to remove the
current element, setting <item> to NULL is sufficient to make the inner
loop not try to re-attach it. It is recommended to reinitialize it though
if it is expected to be reused, so as not to leave its pointers locked.
From within the loop, the list looks like this:
MT_LIST_FOR_EACH_ENTRY_SAFE(item, lh, list, back) {
// A C
// back: <=============>
// item->list
// +---+ +---+ +-V-+ +---+ +---+ +---+
// #=>|lh |<===>| A |x x| |x x| C |<===>| D |<===>| E |<=#
// # +---+ +---+ +---+ +---+ +---+ +---+ #
// #===========================================================#
}
This means that only the current item as well as its two neighbors are
locked. It is thus possible to act on any other part of the list in
parallel (other threads might have begun slightly earlier). However if
a thread is too slow to proceed, other threads may quickly reach its
position, and all of them will then wait on the same element, slowing
down the progress.
Examples
--------
The example below collects up to 50 jobs from a shared list that are compatible
with the current thread, and moves them to a local list for later processing.
The same pointers are used for both lists and placed in an anonymous union.
struct job {
union {
struct list list;
struct mt_list mt_list;
};
unsigned long thread_mask; /* 1 bit per eligible thread */
/* struct-specific stuff below */
...
};
extern struct mt_list global_job_queue;
extern struct list local_job_queue;
struct mt_list back;
struct job *item;
int budget = 50;
/* collect up to 50 shared items */
MT_LIST_FOR_EACH_ENTRY_SAFE(item, &global_job_queue, mt_list, back) {
if (!(item->thread_mask & current_thread_bit))
continue; /* job not eligible for this thread */
LIST_APPEND(&local_job_queue, &item->list);
item = NULL;
if (!--budget)
break;
}
/* process extracted items */
LIST_FOR_EACH(item, &local_job_queue, list) {
...
}