MEDIUM: cpu-topo: switch to the "performance" cpu-policy by default

As mentioned during the NUMA series development, the goal is to use all available cores in the most efficient way by default, which normally corresponds to "cpu-policy performance". The previous default choice of "cpu-policy first-usable-node" was only meant to stay 100% identical to before cpu-policy. So let's switch the default cpu-policy to "performance" right now. The doc was updated to reflect this.
2025-11-18 01:11:01 +01:00 · 2025-06-26 16:01:55 +02:00 · 2025-06-26 16:01:55 +02:00 · b74336984d
commit b74336984d
parent 5128178256
2 changed files with 14 additions and 18 deletions
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@ -2174,7 +2174,7 @@ cpu-policy <policy>
  The "cpu-policy" directive chooses between a small number of allocation
  policies which one to use instead, when "cpu-map" is not used. The following
-  policies are currently supported:
+  policies are currently supported, with "performance" being the default one:
   - none               no particular post-selection is performed. All enabled
                        CPUs will be usable, and if the number of threads is
@ -2202,8 +2202,7 @@ cpu-policy <policy>
                        node with enabled CPUs will be used, and this number of
                        CPUs will be used as the number of threads. A single
                        thread group will be enabled with all of them, within
-                        the limit of 32 or 64 depending on the system. This is
+                        the limit of 32 or 64 depending on the system.
                        the default policy.
   - group-by-2-ccx     same as "group-by-ccx" below but create a group every
                        two CCX. This can make sense on CPUs having many CCX of
@ -2299,7 +2298,7 @@ cpu-policy <policy>
                        such as network handling is much more effective. On
                        development systems, these can also be used to run
                        auxiliary tools such as load generators and monitoring
-                        tools.
+                        tools. This is the default policy.
   - resource           this is like "group-by-cluster" above, except that only
                        the smallest and most efficient CPU cluster will be
@ -2904,18 +2903,15 @@ no-quic
  processed by haproxy. See also "quic_enabled" sample fetch.
 numa-cpu-mapping
-  When running on a NUMA-aware platform with the cpu-policy is set to
+  When running on a NUMA-aware platform, this enables the "cpu-policy"
-  "first-usable-node" (the default one), HAProxy inspects on startup the CPU
+  directive to inspect the topology and figure the best set of CPUs to use and
-  topology of the machine. If a multi-socket machine is detected, the affinity
+  the corresponding number of threads. However, if the applied binding is non
-  is automatically calculated to run on the CPUs of a single node. This is done
+  optimal on a particular architecture, it can be disabled with the statement
-  in order to not suffer from the performance penalties caused by the
+  'no numa-cpu-mapping'. This automatic binding is also not applied if a
-  inter-socket bus latency. However, if the applied binding is non optimal on a
+  'nbthread' statement is present in the configuration, if the affinity of the
-  particular architecture, it can be disabled with the statement 'no
+  process is already specified, for example via the 'cpu-map' directive or the
-  numa-cpu-mapping'. This automatic binding is also not applied if a nbthread
+  taskset utility, or if the cpu-policy is set to any other value. See also
-  statement is present in the configuration, if the affinity of the process is
+  "cpu-map", "cpu-policy", "cpu-set".
  already specified, for example via the 'cpu-map' directive or the taskset
  utility, or if the cpu-policy is set to any other value. See also "cpu-map",
  "cpu-policy", "cpu-set".
 ocsp-update.disable [ on | off ]
  Disable completely the ocsp-update in HAProxy. Any ocsp-update configuration
--- a/src/cpu_topo.c
+++ b/src/cpu_topo.c
@ -60,7 +60,7 @@ static int cpu_policy_resource(int policy, int tmin, int tmax, int gmin, int gma
 static struct ha_cpu_policy ha_cpu_policy[] = {
 	{ .name = "none",               .desc = "use all available CPUs",                           .fct = NULL   },
-	{ .name = "first-usable-node",  .desc = "use only first usable node if nbthreads not set",  .fct = cpu_policy_first_usable_node, .arg = 0 },
+	{ .name = "performance",        .desc = "make one thread group per perf. core cluster",     .fct = cpu_policy_performance      , .arg = 0 },
 	{ .name = "group-by-ccx",       .desc = "make one thread group per CCX",                    .fct = cpu_policy_group_by_ccx ,     .arg = 1 },
 	{ .name = "group-by-2-ccx",     .desc = "make one thread group per 2 CCX",                  .fct = cpu_policy_group_by_ccx ,     .arg = 2 },
 	{ .name = "group-by-3-ccx",     .desc = "make one thread group per 3 CCX",                  .fct = cpu_policy_group_by_ccx ,     .arg = 3 },
@ -69,9 +69,9 @@ static struct ha_cpu_policy ha_cpu_policy[] = {
 	{ .name = "group-by-2-clusters",.desc = "make one thread group per 2 core clusters",        .fct = cpu_policy_group_by_cluster , .arg = 2 },
 	{ .name = "group-by-3-clusters",.desc = "make one thread group per 3 core clusters",        .fct = cpu_policy_group_by_cluster , .arg = 3 },
 	{ .name = "group-by-4-clusters",.desc = "make one thread group per 4 core clusters",        .fct = cpu_policy_group_by_cluster , .arg = 4 },
 	{ .name = "performance",        .desc = "make one thread group per perf. core cluster",     .fct = cpu_policy_performance      , .arg = 0 },
 	{ .name = "efficiency",         .desc = "make one thread group per eff. core cluster",      .fct = cpu_policy_efficiency       , .arg = 0 },
 	{ .name = "resource",           .desc = "make one thread group from the smallest cluster",  .fct = cpu_policy_resource         , .arg = 0 },
 	{ .name = "first-usable-node",  .desc = "use only first usable node if nbthreads not set",  .fct = cpu_policy_first_usable_node, .arg = 0 },
 	{ 0 } /* end */
 };