Commit Graph

24099 Commits

Author SHA1 Message Date
Willy Tarreau
f1210ee7c6 MEDIUM: cfgparse: remove now unused numa & thread-count detection
Ths is not needed anymore since already done before landing here
via thread_detect_count().
2025-03-14 18:30:30 +01:00
Willy Tarreau
e3aef4c9a4 MEDIUM: thread: reimplement first numa node detection
Let's reimplement automatic binding to the first NUMA node when thread
count is not forced. It's the same thing as is already done in
check_config_validity() except that this time it's based on the
collected CPU information. The threads are automatically counted
and CPUs from non-first node(s) are evicted.
2025-03-14 18:30:30 +01:00
Willy Tarreau
4a525e8d27 MEDIUM: cpu-topo: make sure to properly assign CPUs to threads as a fallback
If no cpu-map is done and no cpu-policy could be enforced, we still need
to count the number of usable CPUs, assign them to all threads and set
the nbthread value accordingly.

This already handles the part that was done in check_config_validity()
via thread_cpus_enabled_at_boot.
2025-03-14 18:30:30 +01:00
Willy Tarreau
1af4942c95 MEDIUM: thread: start to detect thread groups and threads min/max
By mutually refining the thread count and group count, we can try
to detect the most suitable setup for the current machine. Taskset
is implicitly handled correctly. tgroups automatically adapt to the
configured number of threads. cpu-map manages to limit tgroups to
the smallest supported value.

The thread-limit is enforced. Just like in cfgparse, if the thread
count was forced to a higher value, it's reduced and a warning is
emitted. But if it was not set, the thr_max value is bound to this
limit so that further calculations respect it.

We continue to default to the max number of available threads and 1
tgroup by default, with the limit. This normally allows to get rid
of that test in check_config_validity().
2025-03-14 18:30:30 +01:00
Willy Tarreau
68069e4b27 MINOR: cpu-topo: add "drop-cpu" and "only-cpu" to cpu-set
These allow respectively to disable binding to CPUs listed in a set, and
to disable binding to CPUs not in a set.
2025-03-14 18:30:30 +01:00
Willy Tarreau
cda4956d9c MINOR: cpu-topo: add a new "cpu-set" global directive to choose cpus
For now it's limited, it only supports "reset" to ask that any previous
"taskset" be ignored. The goal will be to later add more actions that
allow to symbolically define sets of cpus to bind to or to drop. This
also clears the cpu_mask_forced variable that is used to detect
that a taskset had been used.
2025-03-14 18:30:30 +01:00
Willy Tarreau
f0661e79fe MINOR: global: add a command-line option to enable CPU binding debugging
During development, everything related to CPU binding and the CPU topology
is debugged using state dumps at various places, but it does make sense to
have a real command line option so that this remains usable in production
to help users figure why some CPUs are not used by default. Let's add
"-dc" for this. Since the list of global.tune.options values is almost
full and does not 100% match this option, let's add a new "tune.debug"
field for this.
2025-03-14 18:30:30 +01:00
Willy Tarreau
94543d7b65 MINOR: cfgparse: use already known offline CPU information
No need to reparse cpu/online, let's just rely on the info we learned
previously about offline CPUs.
2025-03-14 18:30:30 +01:00
Willy Tarreau
1560827c9d MINOR: cfgparse: move the binding detection into numa_detect_topology()
For now the function refrains from detecting the CPU topology when a
restrictive taskset or cpu-map was already performed on the process,
and it's documented as such, the reason being that until we're able
to automatically create groups, better not change user settings. But
we'll need to be able to detect bound CPUs and to process them as
desired by the user, so we now need to move that detection into the
function itself. It changes nothing to the logic, just gives more
freedom to the function.
2025-03-14 18:30:30 +01:00
Willy Tarreau
ac1db9db7d MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable
The function is not convenient because it doesn't allow us to undo the
startup changes, and depending on where it's being used, we don't know
whether the values read have already been altered (this is not the case
right now but it's going to evolve).

Let's just compute the status during cpu_detect_usable() and set a
variable accordingly. This way we'll always read the init value, and
if needed we can even afford to reset it. Also, placing it in cpu_topo.c
limits cross-file dependencies (e.g. threads without affinity etc).
2025-03-14 18:30:30 +01:00
Willy Tarreau
3a7cc676fa MINOR: cpu-topo: add NUMA node identification to CPUs on FreeBSD
With this patch we're also NUMA node IDs to each CPU when the info is
found. The code is highly inspired from the one in commit f5d48f8b3
("MEDIUM: cfgparse: numa detect topology on FreeBSD."), the difference
being that we're just setting the value in ha_cpu_topo[].
2025-03-14 18:30:30 +01:00
Willy Tarreau
f6154c079e MINOR: cpu-topo: add NUMA node identification to CPUs on Linux
With this patch we're also assigning NUMA node IDs to each CPU when one
is found. The code is highly inspired from the one in commit b56a7c89a
("MEDIUM: cfgparse: detect numa and set affinity if needed") that already
did the job, except that it could be simplified since we're just collecting
info to fill the ha_cpu_topo[] array.
2025-03-14 18:30:30 +01:00
Willy Tarreau
65612369e7 MINOR: cpu-topo: also store the sibling ID with SMT
The sibling ID was not reported because it's not directly accessible
but we don't care, what matters is that we assign numbers to all the
threads we find using the same CPU so that some strategies permit to
allocate one thread at a time if we want to use few threads with max
performance.
2025-03-14 18:30:30 +01:00
Willy Tarreau
7cb274439b MINOR: cpu-topo: add CPU topology detection for linux
This uses the publicly available information from /sys to figure the cache
and package arrangements between logical CPUs and fill ha_cpu_topo[], as
well as their SMT capabilities and relative capacity for those which expose
this. The functions clearly have to be OS-specific.
2025-03-14 18:30:30 +01:00
Willy Tarreau
12f3a2bbb7 MINOR: cpu-topo: try to detect offline cpus at boot
When possible, the offline CPUs are detected at boot and their OFFLINE
flag is set in the ha_cpu_topo[] array. When the detection is not
possible (e.g. not linux, /sys not mounted etc), we just mark none of
them as being offline, as we don't want to infer wrong info that could
hinder automatic CPU placement detection. When valid, we take this
opportunity for refining cpu_topo_lastcpu so that we don't need to
manipulate CPUs beyond this value.
2025-03-14 18:30:30 +01:00
Willy Tarreau
44881e5abf MINOR: cpu-topo: add detection of online CPUs on FreeBSD
On FreeBSD we can detect online CPUs at least by doing the bitwise-OR of
the CPUs of all domains, so we're using this and adding this detection
to ha_cpuset_detect_online(). If we find simpler later, we can always
rework it, but it's reasonably inexpensive since we only check existing
domains.
2025-03-14 18:30:30 +01:00
Willy Tarreau
8f72ce335a MINOR: cpu-topo: add detection of online CPUs on Linux
This adds a generic function ha_cpuset_detect_online() which for now
only supports linux via /sys. It fills a cpuset with the list of online
CPUs that were detected (or returns a failure).
2025-03-14 18:30:30 +01:00
Willy Tarreau
8c524c7c9d REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo
The cpuset files are normally used only for cpu manipulations. It happens
that the initial CPU binding detection was initially placed there since
there was no better place, but in practice, being OS-specific, it should
really be in cpu-topo. This simplifies cpuset which doesn't need to know
about the OS anymore.
2025-03-14 18:30:30 +01:00
Willy Tarreau
a6fdc3eaf0 MINOR: cpu-topo: update CPU topology from excluded CPUs at boot
Now before trying to resolve the thread assignment to groups, we detect
which CPUs are not bound at boot so that we can mark them with
HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we
can count later. Note that we purposely ignore cpu-map here as we
don't know how threads and groups will map to cpu-map entries, hence
which CPUs will really be used.

It's important to proceed this way so that when we have no info we
assume they're all available.
2025-03-14 18:30:30 +01:00
Willy Tarreau
bdb731172c MINOR: cpu-topo: add a function to dump CPU topology
The new function cpu_dump_topology() will centralize most debugging
calls, and it can make efforts of not dumping some possibly irrelevant
fields (e.g. non-existing cache levels).
2025-03-14 18:30:30 +01:00
Willy Tarreau
041462c4af MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus
We don't want to constantly deal with as many CPUs as a cpuset can hold,
so let's first try to trim the value to what the system claims to support
via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of
the cpuset size though. The value is stored globally so that we can
reuse it elsewhere after initialization.
2025-03-14 18:30:30 +01:00
Willy Tarreau
656cedad42 MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array.
This does the bare minimum to allocate and initialize a global
ha_cpu_topo array for the number of supported CPUs and release
it at deinit time.
2025-03-14 18:30:30 +01:00
Willy Tarreau
d165f5d3ab MINOR: cpu-topo: add ha_cpu_topo definition
This structure will be used to store information about each CPU's
topology (package ID, L3 cache ID, NUMA node ID etc). This will be used
in conjunction with CPU affinity setting to try to perform a mostly
optimal binding between threads and CPU numbers by default. Since it
was noticed during tests that absolutely none of the many machines
tested reports different die numbers, the die_id is not stored.
Also, it was found along experiments that the cluster ID will be used
a lot, half of the time as a node-local identifier, and half of the
time as a global identifier. So let's store the two versions at once
(cl_gid, cl_lid).

Some flags are added to indicate causes of exclusion (offline, excluded
at boot, excluded by rules, ignored by policy).
2025-03-14 18:30:30 +01:00
Willy Tarreau
05a4efb102 MINOR: thread: rely on the cpuset functions to count bound CPUs
let's just clean up the thread_cpus_enabled() code a little bit
by removing the OS-specific code and rely on ha_cpuset_detect_bound()
instead. On macos we continue to use sysconf() for now.
2025-03-14 18:30:30 +01:00
Willy Tarreau
32bb68e736 MINOR: cpuset: make the API support negative CPU IDs
Negative IDs are very convenient to mean "not set", so let's just make
the cpuset API robust against this, especially with ha_cpuset_isset()
so that we don't have to manually add this check everywhere when a
value is not known.
2025-03-14 18:30:30 +01:00
Willy Tarreau
f156baf8ce DOC: design-thoughts: commit numa-auto.txt
Lots of collected data and observations aggregated into a single commit
so as not to lose them. Some parts below come from several commit
messages and are incremental.

Add captures and analysis of intel 14900 where it's not easy to draw
the line between the desired P and E cores.

The 14900 raises some questions (imagine a dual-die variant in multi-socket).
That's the start of an algorithmic distribution of performance cores into
thread groups.

cpu-map currently conflicts a lot with the choices after auto-detection
but it doesn't have to. The problem is the inability to configure the
threads for the whole process like taskset does. By offering this ability
we can also start to designate groups of CPUs symbolically (package, die,
ccx, cores, smt).

It can also be useful to exploit the info from cpuinfo that is not
available in /sys, such as the model number. At least on arm, higher
numbers indicate bigger cores and can be useful to distinguish cores
inside a cluster. It will not indicate big vs medium ones of the same
type (e.g. a78 3.0 vs 2.4 GHz) but can still be effective at identifying
the efficient ones.

In short, infos such as cluster ID not always reliable, and are
local to the package. die_id as well. die number is not reported
here but should definitely be used, as a higher priority than L3.

We're still missing a discriminant between the l3 and cluster number
in order to address heterogenous CPUs (e.g. intel 14900), though in
terms of locality that's currently done correctly.

CPU selection is also a full topic, and some thoughts were noted
regarding sorting by perf vs locality so as never to mix inter-
socket CPUs due to sorting.

The proposed cpu-selection cannot work as-is, because it acts both on
restriction and preference, and these two are not actions but a sequence.
First restrictions must be enforced, and second the remaining CPUs are
sorted according to the preferred criterion, and a number of threads are
selected.

Currently we refine the OS-exposed cluster number but it's not correct
as we can end up with something poorly numbered. We need to respect the
LLC in any case so let's explain the approach.
2025-03-14 18:30:30 +01:00
Willy Tarreau
0ceb1f2c51 DEV: ncpu: also emulate sysconf() for _SC_NPROCESSORS_*
This is also needed in order to make the requested number of CPUs
appear. For now we don't reroute to the original sysconf() call so
we return -1,EINVAL for all other info.
2025-03-14 18:30:30 +01:00
Willy Tarreau
ed75148ca0 BUILD: tools: avoid a build warning on gcc-4.8 in resolve_sym_name()
A build warning is emitted with gcc-4.8 in tools.c since commit
e920d73f59 ("MINOR: tools: improve symbol resolution without dl_addr")
because the compiler doesn't see that <size> is necessarily initialized.
Let's just preset it.
2025-03-14 18:30:30 +01:00
Willy Tarreau
4e09789644 MINOR: tools: teach resolve_sym_name() a few more common symbols
This adds run_poll_loop, run_tasks_from_lists, process_runnable_tasks,
ha_dump_backtrace and cli_io_handler which are fairly common in
backtraces. This will be less relative symbols when dladdr is not
usable.
2025-03-13 17:31:16 +01:00
Willy Tarreau
a3582a77f7 MINOR: tools: ease the declaration of known symbols in resolve_sym_name()
Let's have a macro that declares both the symbol and its name, it will
avoid the risk of introducing typos, and encourages adding more when
needed. The macro also takes an optional second argument to permit an
inline declaration of an extern symbol.
2025-03-13 17:30:48 +01:00
Willy Tarreau
e920d73f59 MINOR: tools: improve symbol resolution without dl_addr
When dl_addr is not usable or fails, better fall back to the closest
symbol among the known ones instead of providing everything relative
to main. Most often, the location of the function will give some hints
about what it can be. Thus now we can emit fct+0xXXX in addition to
main+0xXXX or main-0xXXX. We keep a margin of +256kB maximum after a
function for a match, which is around the maximum size met in an object
file, otherwise it becomes pointless again.
2025-03-13 17:30:48 +01:00
Willy Tarreau
1e99efccef MINOR: cli: export cli_io_handler() to ease symbol resolution
It's common to meet this function in backtraces, it's a bit annoying
that it's not resolved, so let's export it so that it becomes resolvable.
2025-03-13 17:30:48 +01:00
Aurelien DARRAGON
8311be5ac6 BUG/MINOR: stats: fix capabilities and hide settings for some generic metrics
Performing a diff on stats output before vs after commit 66152526
("MEDIUM: stats: convert counters to new column definition") revealed
that some metrics were not properly ported to to the new API. Namely,
"lbtot", "cli_abrt" and "srv_abrt" are now exposed on frontend and
listeners while it was not the case before.

Also, "hrsp_other" is exposed even when "mode http" wasn't set on the
proxy.

In this patch we restore original behavior by fixing the capabilities
and hide settings.

As this could be considered as a minor regression (looking at the commit
message it doesn't seem intended), better tag this as a bug. It should be
backported in 3.0 with 66152526.
2025-03-13 11:49:18 +01:00
Aurelien DARRAGON
4c3eb60e70 DOC: management: rename some last occurences from domain "dns" to "resolvers"
This is a complementary patch to cf913c2f9 ("DOC: management: rename show
stats domain cli "dns" to "resolvers"). The doc still refered to the
legacy "dns" domain filter for stat command. Let's rename those occurences
to "resolvers".

It may be backported to all stable versions.
2025-03-13 11:49:10 +01:00
Willy Tarreau
78ef52dbd1 BUILD: backend: silence a build warning when threads are disabled
Since commit 8de8ed4f48 ("MEDIUM: connections: Allow taking over
connections from other tgroups.") we got this partially absurd
build warning when disabling threads:

  src/backend.c: In function 'conn_backend_get':
  src/backend.c:1371:27: warning: array subscript [0, 0] is outside array bounds of 'struct tgroup_info[1]' [-Warray-bounds]

The reason is that gcc sees that curtgid is not equal to tgid which is
defined as 1 in this case, thus it figures that tgroup_info[curtgid-1]
will be anything but zero and that doesn't fit. It is ridiculous as it
is a perfect case of dead code elimination which should not warrant a
warning. Nevertheless we know we don't need to do this when threads are
disabled and in this case there will not be more than 1 thread group, so
we can happily use that preliminary test to help the compiler eliminate
the dead condition and avoid spitting this warning.

No backport is needed.
2025-03-12 18:16:14 +01:00
Willy Tarreau
b61ed9babe BUILD: tools: silence a build warning when USE_THREAD=0
The dladdr_lock that was added to avoid re-entering into dladdr is
conditioned by threads, but the way it's declared causes a build
warning if threads are disabled due to the insertion of a lone semi
colon in the variables block. Let's switch to __decl_thread_var()
for this.

This can be backported wherever commit eb41d768f9 ("MINOR: tools:
use only opportunistic symbols resolution") is backported. It relies
on these previous two commits:

   bb4addabb7 ("MINOR: compiler: add a simple macro to concatenate resolved strings")
   69ac4cd315 ("MINOR: compiler: add a new __decl_thread_var() macro to declare local variables")
2025-03-12 18:11:14 +01:00
Willy Tarreau
69ac4cd315 MINOR: compiler: add a new __decl_thread_var() macro to declare local variables
__decl_thread() already exists but is more suited for struct members.
When using it in a variables block, it appends the final trailing
semi-colon which is a statement that ends the variable block. Better
clean this up and have one precisely for variable blocks. In this
case we can simply define an unused enum value that will consume the
semi-colon. That's what the new macro __decl_thread_var() does.
2025-03-12 18:08:12 +01:00
Willy Tarreau
bb4addabb7 MINOR: compiler: add a simple macro to concatenate resolved strings
It's often useful to be able to concatenate strings after resolving
them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro
to do that, which calls _CONCAT() with the same arguments to make
sure the contents are resolved before being concatenated.
2025-03-12 18:06:55 +01:00
Willy Tarreau
12383fd9f5 BUG/MEDIUM: thread: use pthread_self() not ha_pthread[tid] in set_affinity
A bug was uncovered by the work on NUMA. It only triggers in the CI
with libmusl due to a race condition. What happens is that the call
to set_thread_cpu_affinity() is done very early in the polling loop,
and that it relies on ha_pthread[tid] instead of pthread_self(). The
problem is that ha_pthread[tid] is only set by the return from
pthread_create(), which might happen later depending on the number of
CPUs available to run the starting thread.

Let's just use pthread_self() here. ha_pthread[] is only used to send
signals between threads, there's no point in using it here.

This can be backported to 2.6.
2025-03-12 15:59:23 +01:00
Aurelien DARRAGON
e942305214 MEDIUM: log: change default "host" strategy for log-forward section
Historically, log-forward proxy used to preserve host field from input
message as much as possible, and if syslog host wasn't provided
(rfc5424 '-' or bad rfc3164 or rfc5424 message) then "localhost" or "-"
would be used as host when outputting message using rfc3164 or rfc5424.

We change that behavior (which corresponds to "keep" host option), so that
log-forward now uses "fill" strategy as default: if the host is provided
in input message, it is preserved. However if it is missing and IP address
from sender is available, we use it.
2025-03-12 10:55:49 +01:00
Aurelien DARRAGON
ad0133cc50 MINOR: log: handle log-forward "option host"
Following previous patch, we know implement the logic for the host
option under log-forward section. Possible strategies are:

      replace If input message already contains a value for the host
              field, we replace it by the source IP address from the
              sender.
              If input message doesn't contain a value for the host field
              (ie: '-' as input rfc5424 message or non compliant rfc3164
              or rfc5424 message), we use the source IP address from the
              sender as host field.

      fill    If input message already contains a value for the host field,
              we keep it.
              If input message doesn't contain a value for the host field
              (ie: '-' as input rfc5424 message or non compliant rfc3164
              or rfc5424 message), we use the source IP address from the
              sender as host field.

      keep    If input message already contains a value for the host field,
              we keep it.
              If input message doesn't contain a value for the host field,
              we set it to localhost (rfc3164) or '-' (rfc5424).
              (This is the default)

      append  If input message already contains a value for the host field,
              we append a comma followed by the IP address from the sender.
              If input message doesn't contain a value for the host field,
              we use the source IP address from the sender.

Default value (unchanged) is "keep" strategy. option host is only relevant
with rfc3164 or rfc5424 format on log targets. Also, if the source address
is not available (ie: UNIX socket), default behavior prevails.

Documentation was updated.
2025-03-12 10:52:07 +01:00
Aurelien DARRAGON
003fe530ae MINOR: log: add "option host" log-forward option
add only the parsing part, options are currently unused
2025-03-12 10:51:35 +01:00
Aurelien DARRAGON
47f14be9f3 MINOR: tools: only print address in sa2str() when port == -1
Support special value for port in sa2str: if port is equal to -1, only
print the address without the port, also ignoring <map_ports> value.
2025-03-12 10:51:20 +01:00
Aurelien DARRAGON
2de62d0461 MINOR: log: provide source address information in syslog_process_message()
provide struct sockaddr_storage pointer from the message sender in
syslog_process_message()
2025-03-12 10:50:30 +01:00
Aurelien DARRAGON
bc76f6dde9 MINOR: log: migrate log-forward options from proxy->options2 to options3
Migrate recently added log-forward section options, currently stored under
proxy->options2 to proxy->options3 since proxy->options2 is running out of
space and we plan on adding more log-forward options.
2025-03-12 10:50:03 +01:00
Aurelien DARRAGON
cc5a66212d MINOR: proxy: add proxy->options3
proxy->options2 is almost full, yet we will add new log-forward options
in upcoming patches so we anticipate that by adding a new {no_}options3
and cfg_opts3[] to further extend proxy options
2025-03-12 10:49:36 +01:00
Aurelien DARRAGON
d47e7103b8 CLEANUP: log: add syslog_process_message() helper
Prevent code duplication under syslog_fd_handler() and syslog_io_handler()
by merging common code path in a single syslog_process_message() helper
that processed a single message stored in <buf> according to <frontend>
settings.
2025-03-12 10:49:18 +01:00
Aurelien DARRAGON
8b8520305e CLEANUP: log-forward: remove useless options2 init
It is actually not required to zero out proxy->options2 since proxy is
allocated using calloc() which already does it.
2025-03-12 10:49:08 +01:00
William Lallemand
c6e6318125 CI: github: add "jose" to apt dependencies
jose is used in the JWS unit-test, let's add it to the CI.
2025-03-11 22:29:40 +01:00
William Lallemand
d014d7ee72 TESTS: jws: implement a test for JWS signing
This test returns a JWS payload signed a specified private key in the
PEM format, and uses the "jose" command tool to check if the signature
is correct against the jwk public key.

The test could be improved later by using the code from jwt.c allowing
to check a signature.
2025-03-11 22:29:40 +01:00