haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-08-13 18:46:57 +02:00

Author	SHA1	Message	Date
Willy Tarreau	aaa4080b8b	CLEANUP: thread: now remove the temporary CPU node binding code This is now superseded by the default "safe" cpu-policy, and every time it's used, that code was bypassed anyway since global.nbthread was set. We can now safely remove it. Note that for other policies which do not set a thread count nor further restrict CPUs (such as "none", or even "safe" when finding a single node), we continue to go through the fallback code that automatically assigns CPUs to threads and counts them.	2025-03-14 18:33:16 +01:00
Willy Tarreau	56d939866b	MEDIUM: cpu-topo: use the "first-usable-node" cpu-policy by default This now turns the cpu-policy to "first-usable-node" by default, so that we preserve the current default behavior consisting in binding to the first node if nothing was forced. If a second node is found, global.nbthread is set and the previous code will be skipped.	2025-03-14 18:33:16 +01:00
Willy Tarreau	7fc6cdd0b1	MINOR: cpu-topo: add a 'first-usable-node' cpu policy This is a reimplemlentation of the current default policy. It binds to the first node having usable CPUs if found, and drops CPUs from the second and next nodes.	2025-03-14 18:33:16 +01:00
Willy Tarreau	156430ceb6	MINOR: cpu-topo: add a CPU policy setting to the global section We'll need to let the user decide what's best for their workload, and in order to do this we'll have to provide tunable options. For that, we're introducing struct ha_cpu_policy which contains a name, a description and a function pointer. The purpose will be to use that function pointer to choose the best CPUs to use and now to set the number of threads and thread-groups, that will be called during the thread setup phase. The only supported policy for now is "none" which doesn't set/touch anything (i.e. all available CPUs are used).	2025-03-14 18:33:16 +01:00
Willy Tarreau	9a8e8af11a	MINOR: cpu-topo: add "only-cluster" and "drop-cluster" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated hardware cluster number(s). It can be used to bind to only some clusters, such as CCX or different energy efficiency cores. For this reason, here we use the cluster's local ID (local to the node).	2025-03-14 18:33:16 +01:00
Willy Tarreau	a946cfa8b5	MINOR: cpu-topo: add "only-core" and "drop-core" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated hardware core number(s). It can be used to bind to only some clusters as well as to evict efficient cores whose number is known.	2025-03-14 18:33:16 +01:00
Willy Tarreau	c591c9d6a6	MINOR: cpu-topo: add "only-thread" and "drop-thread" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated hardware thread number(s). It can be used to reserve even threads for HW IRQs and odd threads for haproxy for example, or to evict efficient cores that do only have thread #0.	2025-03-14 18:33:16 +01:00
Willy Tarreau	c93ee25054	MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated node(s).	2025-03-14 18:33:16 +01:00
Willy Tarreau	7263366606	MINOR: cpu-topo: ignore excess of too small clusters On some Arm systems (typically A76/N1) where CPUs can be associated in pairs, clusters are reported while they have no incidence on I/O etc. Yet it's possible to have tens of clusters of 2 CPUs each, which is counter productive since it does not even allow to start enough threads. Let's detect this situation as soon as there are at least 4 clusters having each 2 CPUs or less, which is already very suspcious. In this case, all these clusters will be reset as meaningless. In the worst case if needed they'll be re-assigned based on L2/L3.	2025-03-14 18:33:12 +01:00
Willy Tarreau	aa4776210b	MINOR: cpu-topo: create an array of the clusters The goal here is to keep an array of the known CPU clusters, because we'll use that often to decide of the performance of a cluster and its relevance compared to other ones. We'll store the number of CPUs in it, the total capacity etc. For the capacity, we count one unit per core, and 1/3 of it per extra SMT thread, since this is roughly what has been measured on modern CPUs. In order to ease debugging, they're also dumped with -dc.	2025-03-14 18:30:31 +01:00
Willy Tarreau	204ac3c0b6	MINOR: cpu-topo: consider capacity when forming clusters By using the cluster+capacity sorting function we can detect heterogneous clusters which are not properly reported. Thanks to this, the following misnumbered machine featuring 4 big cores, 4 medium ones an 4 small ones is properly detected with its clusters correctly assigned: [keep] thr= 0 -> cpu= 0 pk=00 no=00 cl=000 ts=000 capa=1024 [keep] thr= 1 -> cpu= 1 pk=00 no=00 cl=002 ts=008 capa=278 [keep] thr= 2 -> cpu= 2 pk=00 no=00 cl=002 ts=009 capa=278 [keep] thr= 3 -> cpu= 3 pk=00 no=00 cl=002 ts=010 capa=278 [keep] thr= 4 -> cpu= 4 pk=00 no=00 cl=002 ts=011 capa=278 [keep] thr= 5 -> cpu= 5 pk=00 no=00 cl=001 ts=004 capa=905 [keep] thr= 6 -> cpu= 6 pk=00 no=00 cl=001 ts=005 capa=905 [keep] thr= 7 -> cpu= 7 pk=00 no=00 cl=001 ts=006 capa=866 [keep] thr= 8 -> cpu= 8 pk=00 no=00 cl=001 ts=007 capa=866 [keep] thr= 9 -> cpu= 9 pk=00 no=00 cl=000 ts=001 capa=984 [keep] thr= 10 -> cpu= 10 pk=00 no=00 cl=000 ts=002 capa=984 [keep] thr= 11 -> cpu= 11 pk=00 no=00 cl=000 ts=003 capa=1024 Also this has the benefit of always assigning highest performance clusters with the smallest IDs so that simple configs can decide to simply bind to cluster 0 or clusters 0,1 and benefit from optimal performance.	2025-03-14 18:30:31 +01:00
Willy Tarreau	4a6eaf6c5e	MINOR: cpu-topo: add a function to sort by cluster+capacity The purpose here is to detect heterogenous clusters which are not properly reported, based on the exposed information about the cores capacity. The algorithm here consists in sorting CPUs by capacity within a cluster, and considering as equal all those which have 5% or less difference in capacity with the previous one. This allows large clusters of more than 5% total between extremities, while keeping apart those where the limit is more pronounced. This is quite common in embedded environments with big.little systems, as well as on some laptops.	2025-03-14 18:30:31 +01:00
Willy Tarreau	0290b807dd	MINOR: cpu-topo: renumber cores to avoid holes and make them contiguous Due to the way core numbers are assigned and the presence of SMT on some of them, some holes may remain in the array. Let's renumber them to plug holes once they're known, following pkg/node/die/llc etc, so that they're local to a (pkg,node) set. Now an i7-14700 shows cores 0 to 19, not 0 to 27.	2025-03-14 18:30:31 +01:00
Willy Tarreau	b633b9d422	MINOR: cpu-topo: assign an L3 cache if more than 2 L2 instances On some machines, L3 is not always reported (e.g. on some lx2 or some armada8040). But some also don't have L3 (core 2 quad). However, no L3 when there are more than 2 L2 is quite unheard of, and while we don't really care about firing 2 thread groups for 2 L2, we'd rather avoid doing this if there are 8! In this case we'll declare an L3 instance to fix the situation. This allows small machines to continue to start with two groups while not derivating on large ones.	2025-03-14 18:30:31 +01:00
Willy Tarreau	d169758fa9	MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo It's important that we don't leave unassigned IDs in the topology, because the selection mechanism is based on index-based masks, so an unassigned ID will never be kept. This is particularly visible on systems where we cannot access the CPU topology, the package id, node id and even thread id are set to -1, and all CPUs are evicted due to -1 not being set in the "only-cpu" sets. Here in new function "cpu_fixup_topology()", we assign them with the smallest unassigned value. This function will be used to assign IDs where missing in general.	2025-03-14 18:30:31 +01:00
Willy Tarreau	af648c7b58	MINOR: cpu-topo: assign clusters to cores without and renumber them Due to the previous commit we can end up with cores not assigned any cluster ID. For this, at the end we sort the CPUs by topology and assign cluster IDs to remaining CPUs based on pkg/node/llc. For example an 14900 now shows 5 clusters, one for the 8 p-cores, and 4 of 4 e-cores each. The local cluster numbers are per (node,pkg) ID so that any rule could easily be applied on them, but we also keep the global numbers that will help with thread group assignment. We still need to force to assign distinct cluster IDs to cores running on a different L3. For example the EPYC 74F3 is reported as having 8 different L3s (which is true) and only one cluster. Here we introduce a new function "cpu_compose_clusters()" that is called from the main init code just after cpu_detect_topology() so that it's not OS-dependent. It deals with this renumbering of all clusters in topology order, taking care of considering any distinct LLC as being on a distinct cluster.	2025-03-14 18:30:31 +01:00
Willy Tarreau	385360fe81	MINOR: cpu-topo: ignore single-core clusters Some platforms (several armv7, intel 14900 etc) report one distinct cluster per core. This is problematic as it cannot let clusters be used to distinguish real groups of cores, and cannot be used to build thread groups. Let's just compare the cluster cpus to the siblings, and ignore it if they exactly match. We must also take care of not falling back to core_cpus_list, which can enumerate cores that already have their cluster assigned (e.g. intel 14900 has 4 4-Ecore clusters in addition to the 8 Pcores).	2025-03-14 18:30:31 +01:00
Willy Tarreau	a4471ea56d	MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID This will be used to detect and fix incorrect setups which report the same cluster ID for multiple L3 instances. The arrangement of functions in this file is becoming a real problem. Maybe we should move all this to cpu_topo for example, and better distinguish OS-specific and generic code.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a8acdbd9fd	MINOR: cpu-topo: implement a sorting mechanism by CPU locality Once we've kept only the CPUs we want, the next step will be to form groups and these ones are based on locality. Thus we'll have to sort by locality. For now the locality is only inferred by the index. No grouping is made at this point. For this we add the "cpu_reorder_by_locality" function with a locality-based comparison function.	2025-03-14 18:30:31 +01:00
Willy Tarreau	18133a054d	MINOR: cpu-topo: implement a sorting mechanism for CPU index CPU selection will be performed by sorting CPUs according to various criteria. For dumps however, that's really not convenient and we'll need to reorder the CPUs according to their index only. This is what the new function cpu_reorder_by_index() does. It's called in thread_detect_count() before dumping the CPU topology.	2025-03-14 18:30:31 +01:00
Willy Tarreau	661d49a18a	MINOR: cpu-topo: skip CPU properties that we've verified do not exist A number of entries under /cpu/cpu%d only exist on certain kernel versions, certain archs and/or with certain modules loaded. It's pointless to insist on trying to read them all for all CPUs when we've already verified they do not exist. Thus let's use stat() the first time prior to checking some of them, and only try to access them when they really exist. This almost completely eliminates the large number of ENOENT that was visible in strace during startup.	2025-03-14 18:30:31 +01:00
Willy Tarreau	baeea08dba	MINOR: cpu-topo: skip identification of non-existing CPUs There's no point trying to read all entries under /cpu/cpu%d when that one does not exist, so let's just skip it in this case.	2025-03-14 18:30:31 +01:00
Willy Tarreau	8542c79f9d	MINOR: cpu-topo: skip CPU detection when /sys/.../cpu does not exist There's no point scanning all entries when /cpu doesn't exist in the first place. Let's check once for it and skip the loop in this case.	2025-03-14 18:30:30 +01:00
Willy Tarreau	c5ddf4a5b2	MINOR: cpu-topo: boost the capacity of performance cores with cpufreq Cpufreq alone isn't a good metric on heterogenous CPUs because efficient cores can reach almost as high frequencies as performant ones. Tests have shown that majoring performance cores by 50% gives a pretty accurate estimate of the performance to expect on modern CPUs, and that counting +33% per extra SMT thread is reasonable as well. We don't have the info about the core's quality, but using the presence of SMT is a reasonable approach in this case, given that efficiency cores will not use it. As an example, using one thread of each of the 8 P-cores of an intel i9-14900k gives 395k rps for a corrected total capacity of 69.3k, using the 16 E-cores gives 40.5k for a total capacity of 70.4k, and using both threads of 6 P-cores gives 41.1k for a total capacity of 69.6k. Thus the 3 same scores deliver the same performance in various combinations.	2025-03-14 18:30:30 +01:00
Willy Tarreau	e4aa13e786	MINOR: cpu-topo: use cpufreq before acpi cppc The acpi_cppc method was found to take about 5ms per CPU on a 64-core EPYC system, which is plain unacceptable as it delays the boot by half a second. Let's use the less accurate cpufreq first, which should be sufficient anyway since many systems do not have acpi_cppc. We'll only fall back to acpi_cppc for systems without cpufreq. If it were to be an issue over time, we could also automatically consider that all threads of the same core or even of the same cluster run at the same speed (when a cluster is known to be accurate).	2025-03-14 18:30:30 +01:00
Willy Tarreau	d11241b7ba	MINOR: cpu-topo: fall back to nominal_perf and scaling_max_freq for the capacity When cpu_capacity is not present, let's try to check acpi_cppc's nominal_perf which is similar and commonly found on servers, then scaling_max_freq (though that last one may vary a bit between CPUs depending on die quality). That variation is not a problem since we can absorb a ~5% variation without issue. It was verified on an i9-14900 featuring 5.7-P, 6.0-P and 4.4-E GHz that P-cores were not reordered and that E cores were placed last. It was also OK on a W3-2345 with 4.3 to 4.5GHz.	2025-03-14 18:30:30 +01:00
Willy Tarreau	322c28cc19	MINOR: cpu-topo: refine cpu dump output to better show kept/dropped CPUs It's becoming difficult to see which CPUs are going to be kept/dropped. Let's just skip all offline CPUs, and indicate "keep" in front of those that are going to be used, and "----" in front of the excluded ones. It is way more readable this way. Also let's just drop the array entry number, since it's always the same as the CPU number and is only an internal representation anyway.	2025-03-14 18:30:30 +01:00
Willy Tarreau	f1210ee7c6	MEDIUM: cfgparse: remove now unused numa & thread-count detection Ths is not needed anymore since already done before landing here via thread_detect_count().	2025-03-14 18:30:30 +01:00
Willy Tarreau	e3aef4c9a4	MEDIUM: thread: reimplement first numa node detection Let's reimplement automatic binding to the first NUMA node when thread count is not forced. It's the same thing as is already done in check_config_validity() except that this time it's based on the collected CPU information. The threads are automatically counted and CPUs from non-first node(s) are evicted.	2025-03-14 18:30:30 +01:00
Willy Tarreau	4a525e8d27	MEDIUM: cpu-topo: make sure to properly assign CPUs to threads as a fallback If no cpu-map is done and no cpu-policy could be enforced, we still need to count the number of usable CPUs, assign them to all threads and set the nbthread value accordingly. This already handles the part that was done in check_config_validity() via thread_cpus_enabled_at_boot.	2025-03-14 18:30:30 +01:00
Willy Tarreau	1af4942c95	MEDIUM: thread: start to detect thread groups and threads min/max By mutually refining the thread count and group count, we can try to detect the most suitable setup for the current machine. Taskset is implicitly handled correctly. tgroups automatically adapt to the configured number of threads. cpu-map manages to limit tgroups to the smallest supported value. The thread-limit is enforced. Just like in cfgparse, if the thread count was forced to a higher value, it's reduced and a warning is emitted. But if it was not set, the thr_max value is bound to this limit so that further calculations respect it. We continue to default to the max number of available threads and 1 tgroup by default, with the limit. This normally allows to get rid of that test in check_config_validity().	2025-03-14 18:30:30 +01:00
Willy Tarreau	68069e4b27	MINOR: cpu-topo: add "drop-cpu" and "only-cpu" to cpu-set These allow respectively to disable binding to CPUs listed in a set, and to disable binding to CPUs not in a set.	2025-03-14 18:30:30 +01:00
Willy Tarreau	cda4956d9c	MINOR: cpu-topo: add a new "cpu-set" global directive to choose cpus For now it's limited, it only supports "reset" to ask that any previous "taskset" be ignored. The goal will be to later add more actions that allow to symbolically define sets of cpus to bind to or to drop. This also clears the cpu_mask_forced variable that is used to detect that a taskset had been used.	2025-03-14 18:30:30 +01:00
Willy Tarreau	f0661e79fe	MINOR: global: add a command-line option to enable CPU binding debugging During development, everything related to CPU binding and the CPU topology is debugged using state dumps at various places, but it does make sense to have a real command line option so that this remains usable in production to help users figure why some CPUs are not used by default. Let's add "-dc" for this. Since the list of global.tune.options values is almost full and does not 100% match this option, let's add a new "tune.debug" field for this.	2025-03-14 18:30:30 +01:00
Willy Tarreau	94543d7b65	MINOR: cfgparse: use already known offline CPU information No need to reparse cpu/online, let's just rely on the info we learned previously about offline CPUs.	2025-03-14 18:30:30 +01:00
Willy Tarreau	1560827c9d	MINOR: cfgparse: move the binding detection into numa_detect_topology() For now the function refrains from detecting the CPU topology when a restrictive taskset or cpu-map was already performed on the process, and it's documented as such, the reason being that until we're able to automatically create groups, better not change user settings. But we'll need to be able to detect bound CPUs and to process them as desired by the user, so we now need to move that detection into the function itself. It changes nothing to the logic, just gives more freedom to the function.	2025-03-14 18:30:30 +01:00
Willy Tarreau	ac1db9db7d	MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable The function is not convenient because it doesn't allow us to undo the startup changes, and depending on where it's being used, we don't know whether the values read have already been altered (this is not the case right now but it's going to evolve). Let's just compute the status during cpu_detect_usable() and set a variable accordingly. This way we'll always read the init value, and if needed we can even afford to reset it. Also, placing it in cpu_topo.c limits cross-file dependencies (e.g. threads without affinity etc).	2025-03-14 18:30:30 +01:00
Willy Tarreau	3a7cc676fa	MINOR: cpu-topo: add NUMA node identification to CPUs on FreeBSD With this patch we're also NUMA node IDs to each CPU when the info is found. The code is highly inspired from the one in commit `f5d48f8b3` ("MEDIUM: cfgparse: numa detect topology on FreeBSD."), the difference being that we're just setting the value in ha_cpu_topo[].	2025-03-14 18:30:30 +01:00
Willy Tarreau	f6154c079e	MINOR: cpu-topo: add NUMA node identification to CPUs on Linux With this patch we're also assigning NUMA node IDs to each CPU when one is found. The code is highly inspired from the one in commit `b56a7c89a` ("MEDIUM: cfgparse: detect numa and set affinity if needed") that already did the job, except that it could be simplified since we're just collecting info to fill the ha_cpu_topo[] array.	2025-03-14 18:30:30 +01:00
Willy Tarreau	65612369e7	MINOR: cpu-topo: also store the sibling ID with SMT The sibling ID was not reported because it's not directly accessible but we don't care, what matters is that we assign numbers to all the threads we find using the same CPU so that some strategies permit to allocate one thread at a time if we want to use few threads with max performance.	2025-03-14 18:30:30 +01:00
Willy Tarreau	7cb274439b	MINOR: cpu-topo: add CPU topology detection for linux This uses the publicly available information from /sys to figure the cache and package arrangements between logical CPUs and fill ha_cpu_topo[], as well as their SMT capabilities and relative capacity for those which expose this. The functions clearly have to be OS-specific.	2025-03-14 18:30:30 +01:00
Willy Tarreau	12f3a2bbb7	MINOR: cpu-topo: try to detect offline cpus at boot When possible, the offline CPUs are detected at boot and their OFFLINE flag is set in the ha_cpu_topo[] array. When the detection is not possible (e.g. not linux, /sys not mounted etc), we just mark none of them as being offline, as we don't want to infer wrong info that could hinder automatic CPU placement detection. When valid, we take this opportunity for refining cpu_topo_lastcpu so that we don't need to manipulate CPUs beyond this value.	2025-03-14 18:30:30 +01:00
Willy Tarreau	44881e5abf	MINOR: cpu-topo: add detection of online CPUs on FreeBSD On FreeBSD we can detect online CPUs at least by doing the bitwise-OR of the CPUs of all domains, so we're using this and adding this detection to ha_cpuset_detect_online(). If we find simpler later, we can always rework it, but it's reasonably inexpensive since we only check existing domains.	2025-03-14 18:30:30 +01:00
Willy Tarreau	8f72ce335a	MINOR: cpu-topo: add detection of online CPUs on Linux This adds a generic function ha_cpuset_detect_online() which for now only supports linux via /sys. It fills a cpuset with the list of online CPUs that were detected (or returns a failure).	2025-03-14 18:30:30 +01:00
Willy Tarreau	8c524c7c9d	REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo The cpuset files are normally used only for cpu manipulations. It happens that the initial CPU binding detection was initially placed there since there was no better place, but in practice, being OS-specific, it should really be in cpu-topo. This simplifies cpuset which doesn't need to know about the OS anymore.	2025-03-14 18:30:30 +01:00
Willy Tarreau	a6fdc3eaf0	MINOR: cpu-topo: update CPU topology from excluded CPUs at boot Now before trying to resolve the thread assignment to groups, we detect which CPUs are not bound at boot so that we can mark them with HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we can count later. Note that we purposely ignore cpu-map here as we don't know how threads and groups will map to cpu-map entries, hence which CPUs will really be used. It's important to proceed this way so that when we have no info we assume they're all available.	2025-03-14 18:30:30 +01:00
Willy Tarreau	bdb731172c	MINOR: cpu-topo: add a function to dump CPU topology The new function cpu_dump_topology() will centralize most debugging calls, and it can make efforts of not dumping some possibly irrelevant fields (e.g. non-existing cache levels).	2025-03-14 18:30:30 +01:00
Willy Tarreau	041462c4af	MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus We don't want to constantly deal with as many CPUs as a cpuset can hold, so let's first try to trim the value to what the system claims to support via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of the cpuset size though. The value is stored globally so that we can reuse it elsewhere after initialization.	2025-03-14 18:30:30 +01:00
Willy Tarreau	656cedad42	MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array. This does the bare minimum to allocate and initialize a global ha_cpu_topo array for the number of supported CPUs and release it at deinit time.	2025-03-14 18:30:30 +01:00
Willy Tarreau	d165f5d3ab	MINOR: cpu-topo: add ha_cpu_topo definition This structure will be used to store information about each CPU's topology (package ID, L3 cache ID, NUMA node ID etc). This will be used in conjunction with CPU affinity setting to try to perform a mostly optimal binding between threads and CPU numbers by default. Since it was noticed during tests that absolutely none of the many machines tested reports different die numbers, the die_id is not stored. Also, it was found along experiments that the cluster ID will be used a lot, half of the time as a node-local identifier, and half of the time as a global identifier. So let's store the two versions at once (cl_gid, cl_lid). Some flags are added to indicate causes of exclusion (offline, excluded at boot, excluded by rules, ignored by policy).	2025-03-14 18:30:30 +01:00

1 2 3 4 5 ...

24126 Commits