haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-15 07:51:04 +01:00

Author	SHA1	Message	Date
Willy Tarreau	156430ceb6	MINOR: cpu-topo: add a CPU policy setting to the global section We'll need to let the user decide what's best for their workload, and in order to do this we'll have to provide tunable options. For that, we're introducing struct ha_cpu_policy which contains a name, a description and a function pointer. The purpose will be to use that function pointer to choose the best CPUs to use and now to set the number of threads and thread-groups, that will be called during the thread setup phase. The only supported policy for now is "none" which doesn't set/touch anything (i.e. all available CPUs are used).	2025-03-14 18:33:16 +01:00
Willy Tarreau	c93ee25054	MINOR: cpu-topo: add "only-node" and "drop-node" to cpu-set These are processed after the topology is detected, and they allow to restrict binding to or evict CPUs matching the indicated node(s).	2025-03-14 18:33:16 +01:00
Willy Tarreau	aa4776210b	MINOR: cpu-topo: create an array of the clusters The goal here is to keep an array of the known CPU clusters, because we'll use that often to decide of the performance of a cluster and its relevance compared to other ones. We'll store the number of CPUs in it, the total capacity etc. For the capacity, we count one unit per core, and 1/3 of it per extra SMT thread, since this is roughly what has been measured on modern CPUs. In order to ease debugging, they're also dumped with -dc.	2025-03-14 18:30:31 +01:00
Willy Tarreau	4a6eaf6c5e	MINOR: cpu-topo: add a function to sort by cluster+capacity The purpose here is to detect heterogenous clusters which are not properly reported, based on the exposed information about the cores capacity. The algorithm here consists in sorting CPUs by capacity within a cluster, and considering as equal all those which have 5% or less difference in capacity with the previous one. This allows large clusters of more than 5% total between extremities, while keeping apart those where the limit is more pronounced. This is quite common in embedded environments with big.little systems, as well as on some laptops.	2025-03-14 18:30:31 +01:00
Willy Tarreau	d169758fa9	MINOR: cpu-topo: make sure we don't leave unassigned IDs in the cpu_topo It's important that we don't leave unassigned IDs in the topology, because the selection mechanism is based on index-based masks, so an unassigned ID will never be kept. This is particularly visible on systems where we cannot access the CPU topology, the package id, node id and even thread id are set to -1, and all CPUs are evicted due to -1 not being set in the "only-cpu" sets. Here in new function "cpu_fixup_topology()", we assign them with the smallest unassigned value. This function will be used to assign IDs where missing in general.	2025-03-14 18:30:31 +01:00
Willy Tarreau	af648c7b58	MINOR: cpu-topo: assign clusters to cores without and renumber them Due to the previous commit we can end up with cores not assigned any cluster ID. For this, at the end we sort the CPUs by topology and assign cluster IDs to remaining CPUs based on pkg/node/llc. For example an 14900 now shows 5 clusters, one for the 8 p-cores, and 4 of 4 e-cores each. The local cluster numbers are per (node,pkg) ID so that any rule could easily be applied on them, but we also keep the global numbers that will help with thread group assignment. We still need to force to assign distinct cluster IDs to cores running on a different L3. For example the EPYC 74F3 is reported as having 8 different L3s (which is true) and only one cluster. Here we introduce a new function "cpu_compose_clusters()" that is called from the main init code just after cpu_detect_topology() so that it's not OS-dependent. It deals with this renumbering of all clusters in topology order, taking care of considering any distinct LLC as being on a distinct cluster.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a4471ea56d	MINOR: cpu-topo: implement a CPU sorting mechanism by cluster ID This will be used to detect and fix incorrect setups which report the same cluster ID for multiple L3 instances. The arrangement of functions in this file is becoming a real problem. Maybe we should move all this to cpu_topo for example, and better distinguish OS-specific and generic code.	2025-03-14 18:30:31 +01:00
Willy Tarreau	a8acdbd9fd	MINOR: cpu-topo: implement a sorting mechanism by CPU locality Once we've kept only the CPUs we want, the next step will be to form groups and these ones are based on locality. Thus we'll have to sort by locality. For now the locality is only inferred by the index. No grouping is made at this point. For this we add the "cpu_reorder_by_locality" function with a locality-based comparison function.	2025-03-14 18:30:31 +01:00
Willy Tarreau	18133a054d	MINOR: cpu-topo: implement a sorting mechanism for CPU index CPU selection will be performed by sorting CPUs according to various criteria. For dumps however, that's really not convenient and we'll need to reorder the CPUs according to their index only. This is what the new function cpu_reorder_by_index() does. It's called in thread_detect_count() before dumping the CPU topology.	2025-03-14 18:30:31 +01:00
Willy Tarreau	1af4942c95	MEDIUM: thread: start to detect thread groups and threads min/max By mutually refining the thread count and group count, we can try to detect the most suitable setup for the current machine. Taskset is implicitly handled correctly. tgroups automatically adapt to the configured number of threads. cpu-map manages to limit tgroups to the smallest supported value. The thread-limit is enforced. Just like in cfgparse, if the thread count was forced to a higher value, it's reduced and a warning is emitted. But if it was not set, the thr_max value is bound to this limit so that further calculations respect it. We continue to default to the max number of available threads and 1 tgroup by default, with the limit. This normally allows to get rid of that test in check_config_validity().	2025-03-14 18:30:30 +01:00
Willy Tarreau	f0661e79fe	MINOR: global: add a command-line option to enable CPU binding debugging During development, everything related to CPU binding and the CPU topology is debugged using state dumps at various places, but it does make sense to have a real command line option so that this remains usable in production to help users figure why some CPUs are not used by default. Let's add "-dc" for this. Since the list of global.tune.options values is almost full and does not 100% match this option, let's add a new "tune.debug" field for this.	2025-03-14 18:30:30 +01:00
Willy Tarreau	ac1db9db7d	MINOR: thread: turn thread_cpu_mask_forced() into an init-time variable The function is not convenient because it doesn't allow us to undo the startup changes, and depending on where it's being used, we don't know whether the values read have already been altered (this is not the case right now but it's going to evolve). Let's just compute the status during cpu_detect_usable() and set a variable accordingly. This way we'll always read the init value, and if needed we can even afford to reset it. Also, placing it in cpu_topo.c limits cross-file dependencies (e.g. threads without affinity etc).	2025-03-14 18:30:30 +01:00
Willy Tarreau	7cb274439b	MINOR: cpu-topo: add CPU topology detection for linux This uses the publicly available information from /sys to figure the cache and package arrangements between logical CPUs and fill ha_cpu_topo[], as well as their SMT capabilities and relative capacity for those which expose this. The functions clearly have to be OS-specific.	2025-03-14 18:30:30 +01:00
Willy Tarreau	8f72ce335a	MINOR: cpu-topo: add detection of online CPUs on Linux This adds a generic function ha_cpuset_detect_online() which for now only supports linux via /sys. It fills a cpuset with the list of online CPUs that were detected (or returns a failure).	2025-03-14 18:30:30 +01:00
Willy Tarreau	8c524c7c9d	REORG: cpu-topo: move bound cpu detection from cpuset to cpu-topo The cpuset files are normally used only for cpu manipulations. It happens that the initial CPU binding detection was initially placed there since there was no better place, but in practice, being OS-specific, it should really be in cpu-topo. This simplifies cpuset which doesn't need to know about the OS anymore.	2025-03-14 18:30:30 +01:00
Willy Tarreau	a6fdc3eaf0	MINOR: cpu-topo: update CPU topology from excluded CPUs at boot Now before trying to resolve the thread assignment to groups, we detect which CPUs are not bound at boot so that we can mark them with HA_CPU_F_EXCLUDED. This will be useful to better know on which CPUs we can count later. Note that we purposely ignore cpu-map here as we don't know how threads and groups will map to cpu-map entries, hence which CPUs will really be used. It's important to proceed this way so that when we have no info we assume they're all available.	2025-03-14 18:30:30 +01:00
Willy Tarreau	bdb731172c	MINOR: cpu-topo: add a function to dump CPU topology The new function cpu_dump_topology() will centralize most debugging calls, and it can make efforts of not dumping some possibly irrelevant fields (e.g. non-existing cache levels).	2025-03-14 18:30:30 +01:00
Willy Tarreau	041462c4af	MINOR: cpu-topo: rely on _SC_NPROCESSORS_CONF to trim maxcpus We don't want to constantly deal with as many CPUs as a cpuset can hold, so let's first try to trim the value to what the system claims to support via _SC_NPROCESSORS_CONF. It is obviously still subject to the limit of the cpuset size though. The value is stored globally so that we can reuse it elsewhere after initialization.	2025-03-14 18:30:30 +01:00
Willy Tarreau	656cedad42	MINOR: cpu-topo: allocate and initialize the ha_cpu_topo array. This does the bare minimum to allocate and initialize a global ha_cpu_topo array for the number of supported CPUs and release it at deinit time.	2025-03-14 18:30:30 +01:00
Willy Tarreau	d165f5d3ab	MINOR: cpu-topo: add ha_cpu_topo definition This structure will be used to store information about each CPU's topology (package ID, L3 cache ID, NUMA node ID etc). This will be used in conjunction with CPU affinity setting to try to perform a mostly optimal binding between threads and CPU numbers by default. Since it was noticed during tests that absolutely none of the many machines tested reports different die numbers, the die_id is not stored. Also, it was found along experiments that the cluster ID will be used a lot, half of the time as a node-local identifier, and half of the time as a global identifier. So let's store the two versions at once (cl_gid, cl_lid). Some flags are added to indicate causes of exclusion (offline, excluded at boot, excluded by rules, ignored by policy).	2025-03-14 18:30:30 +01:00
Willy Tarreau	69ac4cd315	MINOR: compiler: add a new __decl_thread_var() macro to declare local variables __decl_thread() already exists but is more suited for struct members. When using it in a variables block, it appends the final trailing semi-colon which is a statement that ends the variable block. Better clean this up and have one precisely for variable blocks. In this case we can simply define an unused enum value that will consume the semi-colon. That's what the new macro __decl_thread_var() does.	2025-03-12 18:08:12 +01:00
Willy Tarreau	bb4addabb7	MINOR: compiler: add a simple macro to concatenate resolved strings It's often useful to be able to concatenate strings after resolving them (e.g. __FILE__, __LINE__ etc). Let's just have a CONCAT() macro to do that, which calls _CONCAT() with the same arguments to make sure the contents are resolved before being concatenated.	2025-03-12 18:06:55 +01:00
Aurelien DARRAGON	003fe530ae	MINOR: log: add "option host" log-forward option add only the parsing part, options are currently unused	2025-03-12 10:51:35 +01:00
Aurelien DARRAGON	47f14be9f3	MINOR: tools: only print address in sa2str() when port == -1 Support special value for port in sa2str: if port is equal to -1, only print the address without the port, also ignoring <map_ports> value.	2025-03-12 10:51:20 +01:00
Aurelien DARRAGON	bc76f6dde9	MINOR: log: migrate log-forward options from proxy->options2 to options3 Migrate recently added log-forward section options, currently stored under proxy->options2 to proxy->options3 since proxy->options2 is running out of space and we plan on adding more log-forward options.	2025-03-12 10:50:03 +01:00
Aurelien DARRAGON	cc5a66212d	MINOR: proxy: add proxy->options3 proxy->options2 is almost full, yet we will add new log-forward options in upcoming patches so we anticipate that by adding a new {no_}options3 and cfg_opts3[] to further extend proxy options	2025-03-12 10:49:36 +01:00
Amaury Denoyelle	dc7913d814	MAJOR: mux-quic: increase stream flow-control for multi-buffer alloc Support for multiple Rx buffers per QCS instance has been introduced by previous patches. However, due to flow-control initial values, client were still unable to fully used this to increase their upload throughput. This patch increases max-stream-data-bidi-remote flow-control initial values. A new define QMUX_STREAM_RX_BUF_FACTOR will fix the number of concurrent buffers allocable per QCS. It is set to 90. Note that connection flow-control initial value did not changed. It is still configured to be equivalent to bufsize multiplied by the maximum concurrent streams. This ensures that Rx buffers allocation is still constrained per connection, so that it won't be possible to have all active QCS instances using in parallel their maximum Rx buffers count.	2025-03-07 12:06:27 +01:00
Amaury Denoyelle	a4f31ffeeb	MINOR: mux-quic: store QCS Rx buf in a single-entry tree Convert QCS rx buffer pointer to a tree container. Additionnaly, offset field of qc_stream_rxbuf is thus transformed into a node tree. For now, only a single Rx buffer is stored at most in QCS tree. Multiple Rx buffers will be implemented in a future patch to improve QUIC clients upload throughput.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	cc3c2d1f12	MINOR: mux-quic: define rxbuf wrapper Define a new type qc_stream_rxbuf. This is used as a wrapper around QCS Rx buffer with encapsulation of the ncbuf storage. It is allocated via a new pool. Several functions are adapted to be able to deal with qc_stream_rxbuf as a wrapper instead of the previous plain ncbuf instance. No functional change should happen with this patch. For now, only a single qc_stream_rxbuf can be instantiated per QCS. However, this new type will be useful to implement multiple Rx buffer storage in a future commit.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	4b1e63d191	MINOR: mux-quic: define globally stream rxbuf size QCS uses ncbuf for STREAM data storage. This serves as a limit for maximum STREAM buffering capacity, advertised via QUIC transport parameters for initial flow-control values. Define a new function qmux_stream_rx_bufsz() which can be used to retrieve this Rx buffer size. This can be used both in MUX/H3 layers and in QUIC transport parameters.	2025-03-07 12:06:26 +01:00
Amaury Denoyelle	861b11334c	MINOR: h3/hq-interop: restore function for standalone FIN receive Previously, a function qcs_http_handle_standalone_fin() was implemented to handle a received standalone FIN, bypassing app_ops layer decoding. However, this was removed as app_ops layer interaction is necessary. For example, HTTP/3 checks that FIN is never sent on the control uni stream. This patch reintroduces qcs_http_handle_standalone_fin(), albeit in a slightly diminished version. Most importantly, it is now the responsibility of the app_ops layer itself to use it, to avoid the shortcoming described above. The main objective of this patch is to be able to support standalone FIN in HTTP/0.9 layer. This is easily done via the reintroduction of qcs_http_handle_standalone_fin() usage. This will be useful to perform testing, as standalone FIN is a corner case which can easily be broken.	2025-03-07 12:06:26 +01:00
Valentine Krasnobaeva	e900ef987e	BUG/MEIDUM: startup: return to initial cwd only after check_config_validity() In check_config_validity() we evaluate some sample fetch expressions (log-format, server rules, etc). These expressions may use external files like maps. If some particular 'default-path' was set in the global section before, it's no longer applied to resolve file pathes in check_config_validity(). parse_cfg() at the end of config parsing switches back to the initial cwd. This fixes the issue #2886. This patch should be backported in all stable versions since 2.4.0, including 2.4.0.	2025-03-06 10:49:48 +01:00
Roberto Moreda	f98b5c4f59	MINOR: log: add dont-parse-log and assume-rfc6587-ntf options This commit introduces the dont-parse-log option to disable log message parsing, allowing raw log data to be forwarded without modification. Also, it adds the assume-rfc6587-ntf option to frame log messages using only non-transparent framing as per RFC 6587. This avoids missparsing in certain cases (mainly with non RFC compliant messages). The documentation is updated to include details on the new options and their intended use cases. This feature was discussed in GH #2856	2025-03-06 09:30:39 +01:00
Aurelien DARRAGON	0746f6bde0	MINOR: cfgparse-listen: add and use cfg_parse_listen_match_option() helper cfg_parse_listen_match_option() takes cfg_opt array as parameter, as well current args, expected mode and cap bitfields. It is expected to be used under cfg_parse_listen() function or similar. Its goal is to remove code duplication around proxy->options and proxy->options2 handling, since the same checks are performed for the two. Also, this function could help to evaluate proxy options for mode-specific proxies such as log-forward section for instance: by giving the expected mode and capatiblity as input, the function would only match compatible options.	2025-03-06 09:30:18 +01:00
Aurelien DARRAGON	d9aa199100	MINOR: proxy: make pr_mode enum bitfield compatible Current pr_mode enum is a regular enum because a proxy only supports one mode at a time. However it can be handy for a function to be given a list of compatible modes for a proxy, and we can't do that using a bitfield because pr_mode is not bitfield compatible (values share the same bits). In this patch we manually define pr_mode values so that they are all using separate bits and allows a function to take a bitfield of compatible modes as parameter.	2025-03-06 09:30:11 +01:00
Olivier Houchard	335ef3264b	DEBUG: init: Add a macro to register unit tests Add a new macro, REGISTER_UNITTEST(), that will automatically make sure we call hap_register_unittest(), instead of having to create a function that will do so.	2025-03-04 18:18:10 +01:00
William Lallemand	a647839954	DEBUG: init: add a way to register functions for unit tests Doing unit tests with haproxy was always a bit difficult, some of the function you want to test would depend on the buffer or trash buffer initialisation of HAProxy, so building a separate main() for them is quite hard. This patch adds a way to register a function that can be called with the "-U" parameter on the command line, will be executed just after step_init_1() and will exit the process with its return value as an exit code. When using the -U option, every keywords after this option is passed to the callback and could be used as a parameter, letting the capability to handle complex arguments if required by the test. HAProxy need to be built with DEBUG_UNIT to activate this feature.	2025-03-03 12:43:32 +01:00
William Lallemand	4dc0ba233e	MINOR: jws: implement a JWK public key converter Implement a converter which takes an EVP_PKEY and converts it to a public JWK key. This is the first step of the JWS implementation. It supports both EC and RSA keys. Know to work with: - LibreSSL - AWS-LC - OpenSSL > 1.1.1	2025-03-03 12:43:32 +01:00
Willy Tarreau	730641f7ca	BUG/MINOR: server: check for either proxy-protocol v1 or v2 to send hedaer As reported in issue #2882, using "no-send-proxy-v2" on a server line does not properly disable the use of proxy-protocol if it was enabled in a default-server directive in combination with other PP options. The reason for this is that the sending of a proxy header is determined by a test on srv->pp_opts without any distinction, so disabling PPv2 while leaving other options results in a PPv1 header to be sent. Let's fix this by explicitly testing for the presence of either send-proxy or send-proxy-v2 when deciding to send a proxy header. This can be backported to all versions. Thanks to Andre Sencioles (@asenci) for reporting the issue and testing the fix.	2025-03-03 04:05:47 +01:00
Olivier Houchard	706b008429	MEDIUM: servers: Add strict-maxconn. Maxconn is a bit of a misnomer when it comes to servers, as it doesn't control the maximum number of connections we establish to a server, but the maximum number of simultaneous requests. So add "strict-maxconn", that will make it so we will never establish more connections than maxconn. It extends the meaning of the "restricted" setting of tune.takeover-other-tg-connections, as it will also attempt to get idle connections from other thread groups if strict-maxconn is set.	2025-02-26 13:00:18 +01:00
Olivier Houchard	8de8ed4f48	MEDIUM: connections: Allow taking over connections from other tgroups. Allow haproxy to take over idle connections from other thread groups than our own. To control that, add a new tunable, tune.takeover-other-tg-connections. It can have 3 values, "none", where we won't attempt to get connections from the other thread group (the default), "restricted", where we only will try to get idle connections from other thread groups when we're using reverse HTTP, and "full", where we always try to get connections from other thread groups. Unless there is a special need, it is advised to use "none" (or restricted if we're using reverse HTTP) as using connections from other thread groups may have a performance impact.	2025-02-26 13:00:18 +01:00
Olivier Houchard	c36aae2af1	MINOR: pollers: Add a fixup_tgid_takeover() method. Add a fixup_tgid_takeover() method to pollers for which it makes sense (epoll, kqueue and evport). That method can be called after a takeover of a fd from a different thread group, to make sure the poller's internal structure reflects the new state.	2025-02-26 13:00:18 +01:00
Olivier Houchard	c5cc09c00d	MINOR: fd: Add fd_lock_tgid_cur(). Add fd_lock_tgid_cur(), a function that will lock the tgid, without modifying its value.	2025-02-26 13:00:18 +01:00
Olivier Houchard	52b97ff8dd	MEDIUM: fd: Wait if locked in fd_grab_tgid() and fd_take_tgid(). Wait while the tgid is locked in fd_grab_tgid() and fd_take_tgid(). As that lock is barely used, it should have no impact.	2025-02-26 13:00:18 +01:00
Willy Tarreau	fb7874c286	MINOR: tinfo: split the signal handler report flags into 3 While signals are not recursive, one signal (e.g. wdt) may interrupt another one (e.g. debug). The problem this causes is that when leaving the inner handler, it removes the outer's flag, hence the protection that comes with it. Let's just have 3 distinct flags for regular signals, debug signal and watchdog signal. We add a 4th definition which is an aggregate of the 3 to ease testing.	2025-02-24 13:37:52 +01:00
Vincent Dechenaux	9011b3621b	MINOR: compression: Introduce minimum size This is the introduction of "minsize-req" and "minsize-res". These two options allow you to set the minimum payload size required for compression to be applied. This helps save CPU on both server and client sides when the payload does not need to be compressed.	2025-02-22 11:32:40 +01:00
Willy Tarreau	29e246a84c	MINOR: freq_ctr: provide non-blocking read functions Some code called by the debug handlers in the context of a signal handler accesses to some freq_ctr and occasionally ends up on a locked one from the same thread that is dumping it. Let's introduce a non-blocking version that at least allows to return even if the value is in the process of being updated, it's less problematic than hanging.	2025-02-21 18:26:29 +01:00
Willy Tarreau	ddd173355c	MINOR: tinfo: add a new thread flag to indicate a call from a sig handler Signal handlers must absolutely not change anything, but some long and complex call chains may look innocuous at first glance, yet result in some subtle write accesses (e.g. pools) that can conflict with a running thread being interrupted. Let's add a new thread flag TH_FL_IN_SIG_HANDLER that is only set when entering a signal handler and cleared when leaving them. Note, we're speaking about real signal handlers (synchronous ones), not deferred ones. This will allow some sensitive call places to act differently when detecting such a condition, and possibly even to place a few new BUG_ON().	2025-02-21 17:41:38 +01:00
Aurelien DARRAGON	9561b9fb69	BUG/MINOR: sink: add tempo between 2 connection attempts for sft servers When the connection for sink_forward_{oc}_applet fails or a previous one is destroyed, the sft->appctx is instantly released. However process_sink_forward_task(), which may run at any time, iterates over all known sfts and tries to create sessions for orphan ones. It means that instantly after sft->appctx is destroyed, a new one will be created, thus a new connection attempt will be made. It can be an issue with tcp log-servers or sink servers, because if the server is unavailable, process_sink_forward() will keep looping without any temporisation until the applet survives (ie: connection succeeds), which results in unexpected CPU usage on the threads responsible for that task. Instead, we add a tempo logic so that a delay of 1second is applied between two retries. Of course the initial attempt is not delayed. This could be backported to all stable versions.	2025-02-21 11:22:35 +01:00
Christopher Faulet	851e52b551	BUG/MEDIUM: spoe/mux-spop: Introduce an NOOP action to deal with empty ACK In the SPOP protocol, ACK frame with empty payload are allowed. However, in that case, because only the payload is transferred, there is no data to return to the SPOE applet. Only the end of input is reported. Thus the applet is never woken up. It means that the SPOE filter will be blocked during the processing timeout and will finally return an error. To workaournd this issue, a NOOP action is introduced with the value 0. It is only an internal action for now. It does not exist in the SPOP protocol. When an ACK frame with an empy payload is received, this noop action is transferred to the SPOE applet, instead of nothing. Thanks to this trick, the applet is properly notified. This works because unknown actions are ignored by the SPOE filter. This patch must be backported to 3.1.	2025-02-20 11:56:27 +01:00

1 2 3 4 5 ...

8215 Commits