Commit Graph

85 Commits

Author SHA1 Message Date
Remi Tricot-Le Breton
ac9c49b40d MEDIUM: cache: Use dedicated cache tree lock alongside shctx lock
Every use of the cache tree was covered by the shctx lock even when no
operations were performed on the shared_context lists (avail and hot).
This patch adds a dedicated RW lock for the cache so that blocks of code
that work on the cache tree only can use this lock instead of the
superseding shctx one. This is useful for operations during which the
concerned blocks are already in the hot list.
When the two locks need to be taken at the same time, in
http_action_req_cache_use and in shctx_row_reserve_hot, the shctx one
must be taken first.
A new parameter needed to be added to the shared_context's free_block
callback prototype so that cache_free_block can take the cache lock and
release it afterwards.
2023-11-16 19:35:10 +01:00
Willy Tarreau
cbbee15462 CLEANUP: ring: rename the ring lock "RING_LOCK" instead of "LOGSRV_LOCK"
The ring lock was initially mostly used for the logs and used to inherit
its name in lock stats. Now that it's exclusively used by rings, let's
rename it accordingly.
2023-09-20 21:38:33 +02:00
Willy Tarreau
86854dd032 MEDIUM: threads: detect excessive thread counts vs cpu-map
This detects when there are more threads bound via cpu-map than CPUs
enabled in cpu-map, or when there are more total threads than the total
number of CPUs available at boot (for unbound threads) and configured
for bound threads. In this case, a warning is emitted to explain the
problems it will cause, and explaining how to address the situation.

Note that some configurations will not be detected as faulty because
the algorithmic complexity to resolve all arrangements grows in O(N!).
This means that having 3 threads on 2 CPUs and one thread on 2 CPUs
will not be detected as it's 4 threads for 4 CPUs. But at least configs
such as T0:(1,4) T1:(1,4) T2:(2,4) T3:(3,4) will not trigger a warning
since they're valid.
2023-09-04 19:39:17 +02:00
Willy Tarreau
8357f950cb MEDIUM: threads: detect incomplete CPU bindings
It's very easy to mess up with some cpu-map directives and to leave
some thread unbound. Let's add a test that checks that either all
threads are bound or none are bound, but that we do not face the
intermediary situation where some are pinned and others are left
wandering around, possibly on the same CPUs as bound ones.

Note that this should not be backported, or maybe turned into a
notice only, as it appears that it will easily catch invalid
configs and that may break updates for some users.
2023-09-04 19:39:17 +02:00
Willy Tarreau
151f9a2808 BUG/MINOR: cpuset: remove the bogus "proc" from the cpu_map struct
We're currently having a problem with the porting from cpu_map from
processes to thread-groups as it happened in 2.7 with commit 5b09341c0
("MEDIUM: cpu-map: replace the process number with the thread group
number"), though it seems that it has deeper roots even in 2.0 and
that it was progressively made worng over time.

The issue stems in the way the per-process and per-thread cpu-sets were
employed over time. Originally only processes were supported. Then
threads were added after an optional "/" and it was documented that
"cpu-map 1" is exactly equivalent to "cpu-map 1/all" (this was clarified
in 2.5 by commit 317804d28 ("DOC: update references to process numbers
in cpu-map and bind-process").

The reality is different: when processes were still supported, setting
"cpu-map 1" would apply the mask to the process itself (and only when
run in the background, which is not documented either and is also a
bug for another fix), and would be combined with any possible per-thread
mask when calculating the threads' affinity, possibly resulting in empty
sets. However, "cpu-map 1/all" would only set the mask for the threads
and not the process. As such the following:

    cpu-map 1 odd
    cpu-map 1/1-8 even

would leave no CPU while doing:

    cpu-map 1/all odd
    cpu-map 1/1-8 even

would allow all CPUs.

While such configs are very unlikely to ever be met (which is why this
bug is tagged minor), this is becoming quite more visible while testing
automatic CPU binding during 2.9 development because due to this bug
it's much more common to end up with incorrect bindings.

This patch fixes it by simply removing the .proc entry from cpu_map and
always setting all threads' maps. The process is no longer arbitrarily
bound to the group 1's mask, but in case threads are disabled, we'll
use thread 1's mask since it contains the configured CPUs.

This fix should be backported at least to 2.6, but no need to insist if
it resists as it's easier to break cpu-map than to fix an unlikely issue.
2023-07-20 11:01:09 +02:00
Willy Tarreau
8fc7073906 BUG/MEDIUM: threads: fix a tiny race in thread_isolate()
Aurélien found a tiny race in thread_isolate() that can allow a thread
that was running under isolation to continue running while another one
enters isolation. The reason is that the check for harmless is only
done before winning the CAS, but since the previously isolated thread
doesn't wait for !rdv_request in thread_release(), it can effectively
continue its activities while the next one believes it's isolated. A
proper solution consists in looping once again in thread_isolate() to
recheck (and wait) for all threads to be isolated once the CAS is won.

The issue was introduced in 2.7 by commit 598cf3f22 ("MAJOR: threads:
change thread_isolate to support inter-group synchronization") so the
fix needs to be backported there.
2023-05-27 13:53:46 +02:00
eaglegai
ef667b1ad8 BUG/MINOR: thread: add a check for pthread_create
preload_libgcc_s() use pthread_create to create a thread and then call
pthread_join to use it, but it doesn't check if the option is successful.
So add a check to aviod potential crash.
2023-05-26 12:08:23 +02:00
Amaury Denoyelle
e83f937cc1 MEDIUM: quic: use a global CID trees list
Previously, quic_connection_id were stored in a per-thread tree list.
Datagram were first dispatched to the correct thread using the encoded
TID before a tree lookup was done.

Remove these trees and replace it with a global trees list of 256
entries. A CID is using the list index corresponding to its first byte.
On datagram dispatch, CID is lookup on its tree and TID is retrieved
using new member quic_connection_id.tid. As such, a read-write lock
protects each list instances. With 256 entries, it is expected that
contention should be reduced.

A new structure quic_cid_tree served as a tree container associated with
its read-write lock. An API is implemented to ensure lock safety for
insert/lookup/delete operation.

This patch is a step forward to be able to break the affinity between a
CID and a TID encoded thread. This is required to be able to migrate a
quic_conn after accept to select thread based on their load.

This should be backported up to 2.7 after a period of observation.
2023-04-18 16:54:17 +02:00
Amaury Denoyelle
22a368ce58 CLEANUP: quic: remove unused QUIC_LOCK label
QUIC_LOCK label is never used. Indeed, lock usage is minimal on QUIC as
every connection is pinned to its owned thread.

This should be backported up to 2.7.
2023-04-18 16:20:47 +02:00
Willy Tarreau
97da942ba6 MINOR: thread: keep a bitmask of enabled groups in thread_set
We're only checking for 0, 1, or >1 groups enabled there, and we'll soon
need to be more precise and know quickly which groups are non-empty.
Let's just replace the count with a mask of enabled groups. This will
allow to quickly spot the presence of any such group in a set.
2023-04-13 16:57:51 +02:00
Aurelien DARRAGON
ef6ca67176 BUG/MEDIUM: event_hdl: clean soft-stop handling
soft-stop was not explicitly handled in event_hdl API.

Because of this, event_hdl was causing some leaks on deinit paths.
Moreover, a task responsible for handling events could require some
additional cleanups (ie: advanced async task), and as the task was not
protected against abort when soft-stopping, such cleanup could not be
performed unless the task itself implements the required protections,
which is not optimal.

Consider this new approach:
 'jobs' global variable is incremented whenever an async subscription is
 created to prevent the related task from being aborted before the task
 acknowledges the final END event.

 Once the END event is acknowledged and freed by the task, the 'jobs'
 variable is decremented, and the deinit process may continue (including
 the abortion of remaining tasks not guarded by the 'jobs' variable).

To do this, a new global mt_list is required: known_event_hdl_sub_list
This list tracks the known (initialized) subscription lists within the
process.

sub_lists are automatically added to the "known" list when calling
event_hdl_sub_list_init(), and are removed from the list with
event_hdl_sub_list_destroy().

This allows us to implement a global thread-safe event_hdl deinit()
function that is automatically called on soft-stop thanks to signal(0).
When event_hdl deinit() is initiated, we simply iterate against the known
subscription lists to destroy them.

event_hdl_subscribe_ptr() was slightly modified to make sure that a sub_list
may not accept new subscriptions once it is destroyed (removed from the
known list)
This can occur between the time the soft-stop is initiated (signal(0)) and
haproxy actually enters in the deinit() function (once tasks are either
finished or aborted and other threads already joined).

It is safe to destroy() the subscription list multiple times as long
as the pointer is still valid (ie: first on soft-stop when handling
the '0' signal, then from regular deinit() path): the function does
nothing if the subscription list is already removed.

We partially reverted "BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe"
since we can use parent mt_list locking instead of a dedicated lock to make
the check gainst duplicate subscription ID.
(insert_lock is not useful anymore)

The check in itself is not changed, only the locking method.

sizeof(event_hdl_sub_list) slightly increases: from 24 bits to 32bits due
to the additional mt_list struct within it.

With that said, having thread-safe list to store known subscription lists
is a good thing: it could help to implement additional management
logic for subcription lists and could be useful to add some stats or
debugging tools in the future.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Aurelien DARRAGON
d514ca45c6 BUG/MINOR: event_hdl: make event_hdl_subscribe thread-safe
List insertion in event_hdl_subscribe() was not thread-safe when dealing
with unique identifiers. Indeed, in this case the list insertion is
conditional (we check for a duplicate, then we insert). And while we're
using mt lists for this, the whole operation is not atomic: there is a
race between the check and the insertion.
This could lead to the same ID being registered multiple times with
concurrent calls to event_hdl_subscribe() on the same ID.

To fix this, we add 'insert_lock' dedicated lock in the subscription
list struct. The lock's cost is nearly 0 since it is only used when
registering identified subscriptions and the lock window is very short:
we only guard the duplicate check and the list insertion to make the
conditional insertion "atomic" within a given subscription list.
This is the only place where we need the lock: as soon as the item is
properly inserted we're out of trouble because all other operations on
the list are already thread-safe thanks to mt lists.

A new lock hint is introduced: LOCK_EHDL which is dedicated to event_hdl

The patch may seem quite large since we had to rework the logic around
the subscribe function and switch from simple mt_list to a dedicated
struct wrapping both the mt_list and the insert_lock for the
event_hdl_sub_list type.
(sizeof(event_hdl_sub_list) is now 24 instead of 16)

However, all the changes are internal: we don't break the API.

If 68e692da0 ("MINOR: event_hdl: add event handler base api")
is being backported, then this commit should be backported with it.
2023-04-05 08:58:17 +02:00
Ilya Shipitsin
07be66d21b CLEANUP: assorted typo fixes in the code and comments
This is 35th iteration of typo fixes
2023-04-01 18:33:40 +02:00
Willy Tarreau
1b536a11e7 BUILD: thread: silence a build warning when threads are disabled
When threads are disabled, the compiler complains that we might be
accessing tg->abs[] out of bounds since the array is of size 1. It
cannot know that the condition to do this is never met, and given
that it's not in a fast path, we can make it more obvious.
2023-03-22 10:40:06 +01:00
Willy Tarreau
cf0d0eedc7 BUG/MINOR: thread: report thread and group counts in the correct order
In case too many thread groups are needed for the threads, we emit
an error indicating the problem. Unfortunately the threads and groups
counts were reversed.

This can be backported to 2.6.
2023-03-09 11:40:56 +01:00
Willy Tarreau
f5b63277f4 BUG/MINOR: init: properly detect NUMA bindings on large systems
The NUMA detection code tries not to interfer with any taskset the user
could have specified in init scripts. For this it compares the number of
CPUs available with the number the process is bound to. However, the CPU
count is retrieved after being applied an upper bound of MAX_THREADS, so
if the machine has more than 64 CPUs, the comparison always fails and
makes haproxy think the user has already enforced a binding, and it does
not pin it anymore to a single NUMA node.

This can be verified by issuing:

  $ socat /path/to/sock - <<< "show info" | grep thread

On a dual 48-CPU machine it reports 64, implying that threads are allowed
to run on the second socket:

  Nbthread: 64

With this fix, the function properly reports 96, and the output shows 48,
indicating that a single NUMA node was used:

  Nbthread: 48

Of course nothing is changed when "no numa-cpu-mapping" is specified:

  Nbthread: 64

This can be backported to 2.4.
2023-03-09 10:17:37 +01:00
Willy Tarreau
7b8aac4439 MINOR: tinfo: make thread_set functions return nth group/mask instead of first
thread_set_first_group() and thread_set_first_tmask() were modified
and renamed to instead return the number and mask of the nth group.
Passing zero continues to return the first one, but it will be more
convenient to use this way when building shards.
2023-02-28 10:28:47 +01:00
Frédéric Lécaille
83540ed429 BUILD: thead: Fix several 32 bits compilation issues with uint64_t variables
Cast uint64_t as ullong and difference between two uint64_t as llong.
2023-02-24 09:56:50 +01:00
Willy Tarreau
f91ab7a08c BUG/MEDIUM: thread: fix extraneous shift in the thread_set parser
Aurélien reported a bug making a statement such as "thread 2-2" fail for
a config made of exactly 2 threads. What happens is that the parser for
the "thread" keyword scans a range of thread numbers from either 1..64
or 0,-1,-2 for special values, and presets the bit masks accordingly in
the thread set, except that due to the 1..64 range, the shift length must
be reduced by one. Not doing this causes empty masks for single-bit values
that are exactly equal to the number of threads in the group and fails to
properly parse.

No backport is needed as this was introduced in 2.8-dev3 by commit
bef43dfa6 ("MINOR: thread: add a simple thread_set API").
2023-02-06 18:01:50 +01:00
Willy Tarreau
15c8428060 BUILD: thread: fix build warnings with older gcc compilers
The "{ 0 }" form to initialize an empty structure triggers build warnings
on gcc 4.8, let's use the more common "{ }" instead.
2023-02-04 10:49:01 +01:00
Willy Tarreau
f2988e1447 CLEANUP: listener/thread: remove now unused bind_conf's bind_tgroup/bind_thread
Not needed anymore since last commit, let's get rid of it.
2023-02-03 18:00:21 +01:00
Willy Tarreau
f0de8cacc4 MEDIUM: listener/config: make the "thread" parser rely on thread_sets
Instead of reading and storing a single group and a single mask for a
"thread" directive on a bind line, we now store the complete range in
a thread set that's stored in the bind_conf. The bind_parse_thread()
function now just calls parse_thread_set() to complete the current set,
which starts empty, and thread_resolve_group_mask() was updated to
support retrieving thread group numbers or absolute thread numbers
directly from the pre-filled thread_set, and continue to feed bind_tgroup
and bind_thread. The CLI parsers which were pre-initialized to set the
bind_tgroup to 1 cannot do it anymore as it would prevent one from
restricting the thread set. Instead check_config_validity() now detects
the CLI frontend and passes the info down to thread_resolve_group_mask()
that will automatically use only the group 1's threads for these
listeners. The same is done for the peers listeners for now.

At this step it's already possible to start with all previous valid
configs as well as extended ones supporting comma-delimited thread
sets. In addition the parser already accepts large ranges spanning
multiple groups, but since the underlying listeners infrastructure
is not read, for now we're maintaining a specific check against this
at the higher level of the config validity check.

The patch is a bit large because thread resolution is performed in
multiple steps, so we need to adjust all of them at once to preserve
functional and technical consistency.
2023-02-03 18:00:21 +01:00
Willy Tarreau
bef43dfa60 MINOR: thread: add a simple thread_set API
The purpose is to be able to store large thread sets, defined by ranges
that may cross group boundaries, as well as define lists of groups and
masks. The thread_set struct implements the storage, and the parser is
in parse_thread_set(), with a focus on "bind" lines, but not only.
2023-02-03 18:00:21 +01:00
Aurelien DARRAGON
739281b3d6 BUG/MEDIUM: thread: consider secondary threads as idle+harmless during boot
idle and harmless bits in the tgroup_ctx structure were not explicitly
set during boot.

    | struct tgroup_ctx ha_tgroup_ctx[MAX_TGROUPS] = { };

As the structure is first statically initialized,
.threads_harmless and .threads_idle are automatically zero-
initialized by the compiler.

Unfortulately, this means that such threads are not considered idle
nor harmless by thread_isolate(_full)() functions until they enter
the polling loop (thread_harmless_now() and thread_idle_now() are
respectively called before entering the polling loop)

Because of this, any attempt to call thread_isolate() or thread_isolate_full()
during a startup phase with nbthreads >= 2 will cause thread_isolate to
loop until every secondary threads make it through their first polling loop.

If the startup phase is aborted during boot (ie: "-c" option to check the
configuration), secondary threads may be initialized but will never be started
(ie: they won't enter the polling loop), thus thread_isolate()
could would loop forever in such cases.

We can easily reveal the bug with this patch reproducer:

    |  diff --git a/src/haproxy.c b/src/haproxy.c
    |  index e91691658..0b733f6ee 100644
    |  --- a/src/haproxy.c
    |  +++ b/src/haproxy.c
    |  @@ -2317,6 +2317,10 @@ static void init(int argc, char **argv)
    |   		if (pr || px) {
    |   			/* At least one peer or one listener has been found */
    |   			qfprintf(stdout, "Configuration file is valid\n");
    |  +			printf("haproxy will loop...\n");
    |  +			thread_isolate();
    |  +			printf("we will never reach this\n");
    |  +			thread_release();
    |   			deinit_and_exit(0);
    |   		}
    |   		qfprintf(stdout, "Configuration file has no error but will not start (no listener) => exit(2).\n");

Now we start haproxy with a valid config:
$> haproxy -c -f valid.conf
Configuration file is valid
haproxy will loop...

^C

------------------------------------------------------------------------------

This did not cause any issue so far because no early deinit paths require
full thread isolation. But this may change when new features or requirements
are introduced, so we should fix this before it becomes a real issue.

To fix this, we explicitly assign .threads_harmless and .threads_idle
to .threads_enabled value in thread_map_to_groups() function during boot.
This is the proper place to do this since as long as .threads_enabled is not
explicitly set, its default value is also 0 (zero-initialized by the compiler)

code snippet from thread_isolate() function:
       ulong te = _HA_ATOMIC_LOAD(&ha_tgroup_info[tgrp].threads_enabled);
       ulong th = _HA_ATOMIC_LOAD(&ha_tgroup_ctx[tgrp].threads_harmless);

       if ((th & te) == te)
           break;

Thus thread_isolate(_full()) won't be looping forever in thread_isolate()
even if it were to be used before thread_map_to_groups() is executed.

No backport needed unless this is a requirement.
2023-02-02 08:21:15 +01:00
Willy Tarreau
b2f38c13d1 BUG/MINOR: thread: always reload threads_enabled in loops
A few loops waiting for threads to synchronize such as thread_isolate()
rightfully filter the thread masks via the threads_enabled field that
contains the list of enabled threads. However, it doesn't use an atomic
load on it. Before 2.7, the equivalent variables were marked as volatile
and were always reloaded. In 2.7 they're fields in ha_tgroup_ctx[], and
the risk that the compiler keeps them in a register inside a loop is not
null at all. In practice when ha_thread_relax() calls sched_yield() or
an x86 PAUSE instruction, it could be verified that the variable is
always reloaded. If these are avoided (e.g. architecture providing
neither solution), it's visible in asm code that the variables are not
reloaded. In this case, if a thread exists just between the moment the
two values are read, the loop could spin forever.

This patch adds the required _HA_ATOMIC_LOAD() on the relevant
threads_enabled fields. It must be backported to 2.7.
2023-01-19 19:22:17 +01:00
Remi Tricot-Le Breton
2b96364b35 MINOR: ssl: Add a lock to the OCSP response tree
The tree that contains OCSP responses is never locked despite being used
at runtime for OCSP stapling as well as the CLI through "set ssl cert"
and "set ssl ocsp-response" commands.
Everything works though because the certificate_ocsp structure is
refcounted and the tree's entries are cleaned up when SSL_CTXs are
destroyed (thanks to an ex_data entry in which the certificate_ocsp
pointer is stored).
This new lock will come to use when the OCSP auto update mechanism is
fully implemented because this new feature will be based on another tree
that stores the same certificate_ocsp members and updates their contents
periodically.
2022-12-21 11:21:07 +01:00
Christopher Faulet
5534334f1f MEDIUM: thread: Restric nbthread/thread-group(s) to very first global sections
nbhread, thead-group and thread-groups directives must only be defined in
very first global sections. It means no other section must have been parsed
before. Indeed, some parts of the configuratio depends on the value of these
settings and it is undefined to change them after.
2022-11-18 16:03:45 +01:00
Willy Tarreau
c80bdb2da6 MINOR: threads: report the number of thread groups in build options
haproxy -vv shows the number of threads but didn't report the number
of groups, let's add it.
2022-08-06 16:45:26 +02:00
Willy Tarreau
87aff021db MINOR: thread: provide an alternative to pthread's rwlock
Since version 1.1.0, OpenSSL's libcrypto ignores the provided locking
mechanism and uses pthread's rwlocks instead. The problem is that for
some code paths (e.g. async engines) this results in a huge amount of
syscalls on systems facing a bit of contention, to the point where more
than 80% of the CPU can be spent in the system dealing with spinlocks
just for futex_wake().

This patch provides an alternative by redefining the relevant pthread
rwlocks from the low-overhead version of the progressive rw locks. This
way there will be no more syscalls in case of contention, and CPU will
be burnt in userland. Doing this saves massive amounts of CPU, where
the locks only take 12-15% vs 80% before, which allows SSL to work much
faster on large thread counts (e.g. 24 or more).

The tryrdlock and trywrlock variants have been implemented using a CAS
since their goal is only to succeed on no contention and never to wait.
The pthread_rwlock API is complete except that the timed versions of
the rdlock and wrlock do not wait and simply fall back to trylock
versions.

Since the gains have only been observed with async engines for now,
this option remains disabled by default. It can be enabled at build
time using USE_PTHREAD_EMULATION=1.
2022-07-30 10:17:22 +02:00
Willy Tarreau
c6b596dcce CLEANUP: threads: remove the now unused all_threads_mask and tid_bit
Since these are not used anymore, let's now remove them. Given the
number of places where we're using ti->ldit_bit, maybe an equivalent
might be useful though.
2022-07-15 20:25:41 +02:00
Willy Tarreau
6018c02c36 MEDIUM: thread: change thread_resolve_group_mask() to return group-local values
It used to turn group+local to global but now we're doing the exact
opposite as we want to stick to group-local masks. This means that
"thread 3-4" might very well emit what "thread 2/1-2" used to emit
till now for 2 groups and 4 threads. This is needed because we'll
have to support group-local thread masks in receivers.

However the rest of the code (receivers) is not ready yet for this,
so using this code with more than one thread group will definitely
break some bindings.
2022-07-15 20:16:30 +02:00
Willy Tarreau
5b09341c02 MEDIUM: cpu-map: replace the process number with the thread group number
The principle remains the same, but instead of having a single process
and ignoring extra ones, now we set the affinity masks for the respective
threads of all groups.

The doc was updated with a few extra examples.
2022-07-15 19:43:10 +02:00
Willy Tarreau
1b2b59bfa7 MINOR: thread: remove MAX_THREADS limitation
This one is now causing difficulties during the development phase and
it's going to disappear anyway, let's get rid of it.
2022-07-15 19:43:10 +02:00
Willy Tarreau
7aa41196cf MEDIUM: debug/threads: make the lock debugging take tgroups into account
Since we have to use masks to verify owners/waiters, we have no other
option but to have them per group. This definitely inflates the size
of the locks, but this is only used for extreme debugging anyway so
that's not dramatic.

Thus as of now, all masks in the lock stats are local bit masks, derived
from ti->ltid_bit. Since at boot ltid_bit might not be set, we just take
care of this situation (since some structs are initialized under look
during boot), and use bit 0 from group 0 only.
2022-07-15 19:41:26 +02:00
Willy Tarreau
f15c75a2d3 BUG/MINOR: thread: use the correct thread's group in ha_tkillall()
In ha_tkillall(), the current thread's group was used to check for the
thread being running instead of using the target thread's group mask.
Most of the time it would not have any effect unless some groups are
uneven where it can lead to incomplete thread dumps for example.

No backport is needed, this is purely 2.7.
2022-07-15 19:41:26 +02:00
Willy Tarreau
9b0f0d146f BUG/MINOR: threads: produce correct global mask for tgroup > 1
In thread_resolve_group_mask(), if a global thread number is passed
and it belongs to a group greater than 1, an incorrect shift resulted
in shifting that ID again which made it appear nowhere or in a wrong
group possibly. The bug was introduced in 2.5 with commit 627def9e5
("MINOR: threads: add a new function to resolve config groups and
masks") though the groups only starts to be usable in 2.7, so there
is no impact for this bug, hence no backport is needed.
2022-07-15 19:41:26 +02:00
Willy Tarreau
598cf3f22e MAJOR: threads: change thread_isolate to support inter-group synchronization
thread_isolate() and thread_isolate_full() were relying on a set of thread
masks for all threads in different states (rdv, harmless, idle). This cannot
work anymore when the number of threads increases beyond LONGBITS so we need
to change the mechanism.

What is done here is to have a counter of requesters and the number of the
current isolated thread. Threads which want to isolate themselves increment
the request counter and wait for all threads to be marked harmless (or idle)
by scanning all groups and watching the respective masks. This is possible
because threads cannot escape once they discover this counter, unless they
also want to isolate and possibly pass first. Once all threads are harmless,
the requesting thread tries to self-assign the isolated thread number, and
if it fails it loops back to checking all threads. If it wins it's guaranted
to be alone, and can drop its harmless bit, so that other competing threads
go back to the loop waiting for all threads to be harmless. The benefit of
proceeding this way is that there's very little write contention on the
thread number (none during work), hence no cache line moves between caches,
thus frozen threads do not slow down the isolated one.

Once it's done, the isolated thread resets the thread number (hence lets
another thread take the place) and decrements the requester count, thus
possibly releasing all harmless threads.

With this change there's no more need for any global mask to synchronize
any thread, and we only need to loop over a number of groups to check
64 threads at a time per iteration. As such, tinfo's threads_want_rdv
could be dropped.

This was tested with 64 threads spread into 2 groups, running 64 tasks
(from the debug dev command), 20 "show sess" (thread_isolate()), 20
"add server blah/blah" (thread_isolate()), and 20 "del server blah/blah"
(thread_isolate_full()). The load remained very low (limited by external
socat forks) and no stuck nor starved thread was found.
2022-07-01 19:15:15 +02:00
Willy Tarreau
03f9b35114 MEDIUM: tinfo: add a dynamic thread-group context
The thread group info is not sufficient to represent a thread group's
current state as it's read-only. We also need something comparable to
the thread context to represent the aggregate state of the threads in
that group. This patch introduces ha_tgroup_ctx[] and tg_ctx for this.
It's indexed on the group id and must be cache-line aligned. The thread
masks that were global and that do not need to remain global were moved
there (want_rdv, harmless, idle).

Given that all the masks placed there now become group-specific, the
associated thread mask (tid_bit) now switches to the thread's local
bit (ltid_bit). Both are the same for nbtgroups 1 but will differ for
other values.

There's also a tg_ctx pointer in the thread so that it can be reached
from other threads.
2022-07-01 19:15:15 +02:00
Willy Tarreau
22b2a24eb2 CLEANUP: thread: remove thread_sync_release() and thread_sync_mask
This function was added in 2.0 when reworking the thread isolation
mechanism to make it more reliable. However it if fundamentally
incompatible with the full isolation mechanism provided by
thread_isolate_full() since that one will wait for all threads to
become idle while the former will wait for all threads to finish
waiting, causing a deadlock.

Given that it's not used, let's just drop it entirely before it gets
used by accident.
2022-07-01 19:15:15 +02:00
Willy Tarreau
cce203aae5 MINOR: thread: add a new all_tgroups_mask variable to know about active tgroups
In order to kill all_threads_mask we'll need to have an equivalent for
the thread groups. The all_tgroups_mask does just this, it keeps one bit
set per enabled group.
2022-07-01 19:15:15 +02:00
Willy Tarreau
c6cf64bb5e MINOR: thread: use ltid_bit in ha_tkillall()
Since commit cc7a11ee3 ("MINOR: threads: set the tid, ltid and their bit
in thread_cfg") we ought not use (1UL << thr) to get the group mask for
thread <thr>, but (ha_thread_info[thr].ltid_bit). ha_tkillall() needs
this.
2022-07-01 19:15:15 +02:00
Willy Tarreau
377e37a80f MINOR: tinfo: add the mask of enabled threads in each group
In order to replace the global "all_threads_mask" we'll need to have an
equivalent per group. Take this opportunity for calling it threads_enabled
and make sure which ones are counted there (in case in the future we allow
to stop some).
2022-07-01 19:15:14 +02:00
Willy Tarreau
60fe4a95a2 MINOR: tinfo: replace the tgid with tgid_bit in tgroup_info
Now that the tgid is accessible from the thread, it's pointless to have
it in the group, and it was only set but never used. However we'll soon
frequently need the mask corresponding to the group ID and the risk of
getting it wrong with the +1 or to shift 1 instead of 1UL is important,
so let's store the tgid_bit there.
2022-07-01 19:15:14 +02:00
Willy Tarreau
66ad98a772 MINOR: tinfo: add the tgid to the thread_info struct
At several places we're dereferencing the thread group just to catch
the group number, and this will become even more required once we start
to use per-group contexts. Let's just add the tgid in the thread_info
struct to make this easier.
2022-07-01 19:15:14 +02:00
Willy Tarreau
1a85a958dd MINOR: tinfo: remove the global thread ID bit (tid_bit)
Each thread has its own local thread id and its own global thread id,
in addition to the masks corresponding to each. Once the global thread
ID can go beyond 64 it will not be possible to have a global thread Id
bit anymore, so better start to remove it and use only the local one
from the struct thread_info.
2022-06-14 10:44:38 +02:00
Willy Tarreau
7e2e4f8401 CLEANUP: tree-wide: remove 25 occurrences of unneeded fcntl.h
There were plenty of leftovers from old code that were never removed
and that are not needed at all since these files do not use any
definition depending on fcntl.h, let's drop them.
2022-04-26 10:59:48 +02:00
Willy Tarreau
8ead1d084a BUILD: thread: use initcall instead of a constructor
The constructor present there could be replaced with an initcall.
This one is set at level STG_PREPARE because it also zeroes the
lock_stats, and it's a bit odd that it could possibly have been
scheduled to run after other constructors that might already
preset some of these locks by accident.
2022-04-25 19:23:17 +02:00
Willy Tarreau
627def9e50 MINOR: threads: add a new function to resolve config groups and masks
In the configuration sometimes we'll omit a thread group number to designate
a global thread number range, and sometimes we'll mention the group and
designate IDs within that group. The operation is more complex than it
seems due to the need to check for ranges spanning between multiple groups
and determining groups from threads from bit masks and remapping bit masks
between local/global.

This patch adds a function to perform this operation, it takes a group and
mask on input and updates them on output. It's designed to be used by "bind"
lines but will likely be usable at other places if needed.

For situations where specified threads do not exist in the group, we have
the choice in the code between silently fixing the thread set or failing
with a message. For now the better option seems to return an error, but if
it turns out to be an issue we can easily change that in the future. Note
that it should only happen with "x/even" when group x only has one thread.
2021-10-08 17:22:26 +02:00
Willy Tarreau
b90935c908 MINOR: threads: add the current group ID in thread-local "tgid" variable
This is the equivalent of "tid" for ease of access. In the future if we
make th_cfg a pure thread-local array (not a pointer), it may make sense
to move it there.
2021-10-08 17:22:26 +02:00
Willy Tarreau
43ab05b3da MEDIUM: threads: replace ha_set_tid() with ha_set_thread()
ha_set_tid() was randomly used either to explicitly set thread 0 or to
set any possibly incomplete thread during boot. Let's replace it with
a pointer to a valid thread or NULL for any thread. This allows us to
check that the designated threads are always valid, and to ignore the
thread 0's mapping when setting it to NULL, and always use group 0 with
it during boot.

The initialization code is also cleaner, as we don't pass ugly casts
of a thread ID to a pointer anymore.
2021-10-08 17:22:26 +02:00