haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2025-11-09 04:51:01 +01:00

Author	SHA1	Message	Date
Amaury Denoyelle	7896edccdc	MINOR: quic: remove unused pacing burst in bind_conf/quic_cc_path Pacing burst size is now dynamic. As such, configuration value has been removed and related fields in bind_conf and quic_cc_path structures can be safely removed. This should be backported up to 3.1.	2025-01-23 17:40:48 +01:00
Ilia Shipitsin	89c62693da	BUG/MINOR: listener: handle a possible strdup() failure This defect was found by the coccinelle script "unchecked-strdup.cocci". It can be backported to all supported branches.	2024-12-25 12:41:08 +01:00
Aurelien DARRAGON	b167426b6b	BUG/MINOR: listener: fix potential null pointer dereference in listener_release() As reported by @Bbulatov on GH #2804, fe is found at multiple places in listener_release(): in some places it is first checked against NULL before being de-referenced while in some other places it is not, which is ambiguous and could hide a bug. In practise, fe cannot be NULL for now, but it might not be the case in the future as we want to keep the possibility to run isolated listeners (that is, without proxy attached). We've already ensured this was the case with a57786e ("BUG/MINOR: listener: null pointer dereference suspected by coverity"), but this promise was recently broken by 65ae134 ("BUG/MINOR: listener: Wake proxy's mngmt task up if necessary on session release"). Let's fix that by conditionning the block with an "else if" statement instead of a regular "else". No need for backport except if multi-connection protocols (ie: FTP) were to be backported as well.	2024-12-02 17:22:45 +01:00
Amaury Denoyelle	24cea66e07	MEDIUM: quic: define cubic-pacing congestion algorithm Define a new QUIC congestion algorithm token 'cubic-pacing' for quic-cc-algo bind keyword. This is identical to default cubic implementation, except that pacing is used for STREAM frames emission. This algorithm supports an extra argument to specify a burst size. This is stored into a new bind_conf member named quic_pacing_burst which can be reuse to initialize quic path. Pacing support is still considered experimental. As such, 'cubic-pacing' can only be used with expose-experimental-directives set.	2024-11-19 16:20:58 +01:00
Christopher Faulet	1cc9340afd	MINOR: listener: Remove useless checks on the receiver protocol existence The receiver protocol is always set when a listener is created or cloned. At least for now. And there is no check on it at many places, except in listener_accept() function. So, let's remove remaining useless checks. That will avoid false Coverity reports in future. This patch should fix the issue #2631.	2024-11-06 09:35:01 +01:00
Oliver Dala	a889413f5e	BUG/MEDIUM: cli: Deadlock when setting frontend maxconn The proxy lock state isn't passed down to relax_listener through dequeue_proxy_listeners, which causes a deadlock in relax_listener when it tries to get that lock. Backporting: Older versions didn't have relax_listener and directly called resume_listener in dequeue_proxy_listeners. lpx should just be passed directly to resume_listener then. The bug was introduced in commit 001328873c352e5e4b1df0dcc8facaf2fc1408aa [cf: This patch should fix the issue #2726. It must be backported as far as 2.4]	2024-09-25 17:12:11 +02:00
Amaury Denoyelle	1de5f718cf	MINOR: quic/config: adapt settings to new conn buffer limit QUIC MUX buffer allocation limit is now directly based on the underlying congestion window size. previous static limit based on conn-tx-buffers is now unused. As such, this commit adds a warning to users to prevent that it is now obsolete. Secondly, update max-window-size setting. It is now the main entrypoint to limit both the maximum congestion window size and the number of QUIC MUX allocated buffer on emission. Remove its special value '0' which was used to automatically adjust it on now unused conn-tx-buffers.	2024-08-20 17:59:35 +02:00
Amaury Denoyelle	c24c8667b2	MINOR: quic: define max-window-size config setting Define a new global keyword tune.quic.frontend.max-window-size. This allows to set globally the maximum congestion window size for each QUIC frontend connections. The default value is 0. It is a special value which automatically derive the size from the configured QUIC connection buffer limit. This is similar to the previous "quic-cc-algo" behavior, which can be used to override the maximum window size per bind line.	2024-08-20 17:02:29 +02:00
Amaury Denoyelle	9fbe8b0334	CLEANUP: proto: rename TID affinity callbacks Since the following patch, protocol API to update a connection TID affinity has been extended. commit 1a43b9f32c71267e3cb514aa70a13c75adb20742 MINOR: proto: extend connection thread rebind API The single callback set_affinity has been splitted in 3 different functions which are called at different stages during listener_accept(), depending on accept queue push success or not. However, the naming was rendered confusing by the usage of function prefix 1 and 2. Rename proto callback related to TID affinity update and use the following names : * bind_tid_prep * bind_tid_commit * bind_tid_reset This commit should probably be backported at least up to 3.0 with the above patch. This is because the fix was recently backported and it would allow to keep changes minimal between the two versions. It could even be backported up to 2.8 if there is no major conflict.	2024-07-11 15:14:06 +02:00
Willy Tarreau	0cb8743209	BUILD: listener: silence a build warning about unused value without threads A variable introduced in commit 1a43b9f32c ("MINOR: proto: extend connection thread rebind API") is not used without threads and causes a build warning. Let's just mark it maybe_unused. Since the commit above is tagged for backporting, this one will need to be backported along with it.	2024-07-10 15:17:04 +02:00
Amaury Denoyelle	1a43b9f32c	MINOR: proto: extend connection thread rebind API MINOR: listener: define callback for accept queue push Extend API for connection thread rebind API by replacing single callback set_affinity by three different ones. Each one of them is used at a different stage of the operation : * set_affinity1 is used similarly to previous set_affinity * set_affinity2 is called directly from accept_queue_push_mp() when an entry has been found in accept ring. This operation cannot fail. * reset_affinity is called after set_affinity1 in case of failure from accept_queue_push_mp() due to no space left in accept ring. This is necessary for protocols which must reconfigure resources before fallback on the current tid. This patch does not have any functional changes. However, it will be required to fix crashes for QUIC connections when accept queue ring is full. As such, it must be backported with it.	2024-07-04 16:33:21 +02:00
Amaury Denoyelle	639e73f8f2	MINOR: counters: move freq-ctr from proxy/server into counters struct Move freq-ctr defined in proxy or server structures into their dedicated fe_counters/be_counters struct. Functionnaly no change here. This commit will allow to convert rate stats column to generic one, which is mandatory to manipulate them in the stats-file.	2024-05-02 10:55:25 +02:00
Valentine Krasnobaeva	7041c078d6	MINOR: listener/protocol: add proto name in alerts Frontend and listen sections allow unlimited number of bind statements, it is often, when there is a bind statement per supported protocol, like below: listen test mode http bind quic4@0.0.0.0:443 name quic ssl crt ... bind 0.0.0.0:443 name https ssl alpn http/1.1,h2 crt ... bind 0.0.0.0:8080 ... ... It seems useful to show corresponded protocol name in alerts and warnings, when problem occures with port binding, connection resuming or sharding. This helps to figure out immediately, which bind statement has a wrong setting or which protocol module is the root cause of the issue.	2024-04-12 18:51:40 +02:00
Willy Tarreau	0db8b6034d	BUG/MINOR: listener: always assign distinct IDs to shards When sharded listeners were introdcued in 2.5 with commit 6dfbef4145 ("MEDIUM: listener: add the "shards" bind keyword"), a point was overlooked regarding how IDs are assigned to listeners: they are just duplicated! This means that if a "option socket-stats" is set and a shard is configured, or multiple thread groups are enabled, then a stats dump will produce several lines with exactly the same socket name and ID. This patch tries to address this by trying to assign consecutive numbers to these sockets. The usual algo is maintained, but with a preference for the next number in a shard. This will help users reserve ranges for each socket, for example by using multiples of 100 or 1000 on each bind line, leaving enough room for all shards to be assigned. The mechanism however is quite tricky, because the configured listener currently ends up being the last one of the shard. This helps insert them before the current position without having to revisit them. But here it causes a difficulty which is that we'd like to restart from the current ID and assign new ones on top of it. What is done is that the number is passed between shards and the current one is cleared (and removed from the tree) so that we instead insert the new one. It's tricky because of the situation which depends whether it's the listener that was already assigned on the bind line or not. But overall, always removing the entry, always adding the new one when the ID is not zero, and passing them from the reference to the next one does the trick. This may be backported to all versions till 2.6.	2024-04-09 08:57:02 +02:00
Amaury Denoyelle	0489d85263	MINOR: listener: implement GUID support This commit is similar with the two previous ones. Its purpose is to add GUID support on listeners. Due to bind_conf and listeners configuration, some specifities were required. Its possible to define several listeners on a single bind line, for example by specifying multiple addresses. As such, it's impossible to support a "guid" keyword on a bind line. The problem is exacerbated by the cloning of listeners when sharding is used. To resolve this, a new keyword "guid-prefix" is defined for bind lines. It allows to specify a string which will be used as a prefix for automatically generated GUID for each listeners attached to a bind_conf. Automatic GUID listeners generation is implemented via a new function bind_generate_guid(). It is called on post-parsing, after bind_complete_thread_setup(). For each listeners on a bind_conf, a new GUID is generated with bind_conf prefix and the index of the listener relative to other listeners in the bind_conf. This last value is stored in a new bind_conf field named <guid_idx>. If a GUID cannot be inserted, for example due to a non-unique value, an error is returned, startup is interrupted with configuration rejected.	2024-04-05 15:40:42 +02:00
Christopher Faulet	f31a4e302e	BUG/MINOR: listener: Don't schedule frontend without task in listener_release() null pointer dereference was reported by Coverity in listener_release() function. Indeed, we must not try to schedule frontend without task when a limit is still blocking the frontend. This issue was introduced by commit 65ae1347c7 ("BUG/MINOR: listener: Wake proxy's mngmt task up if necessary on session release") This patch should fix issue #2488. It must be backported to all stable version with the commit above.	2024-03-14 09:34:36 +01:00
Christopher Faulet	65ae1347c7	BUG/MINOR: listener: Wake proxy's mngmt task up if necessary on session release When a session is released, listener_release() function is called to notify the listener. It is an opportunity to resume limited/full listeners. We first try to resume the listener owning the released session, then all limited listeners in the global queue and finally all limited listeners in the frontend's waiting queue. This last step is only performed if there is no limit applied on the frontend. Nothing is performed if the session rate is still limited. And it is an issue because if this happens for the last listener's session, there is no other event to wake the frontend's managment task up and the listener remains in the limited state. To fix the issue, when a limit is still applied on the frontent, we must compute the new wake up date from the sessions rate and schedule the frontend's managment task. It is easy to reproduce the issue in SSL by setting a maxconn and a rate limit on sessions. This patch should fix the issue #2476. It must be backported to all stable versions.	2024-03-13 15:20:06 +01:00
Amaury Denoyelle	9b806550b7	MINOR: quic: warn on bind on multiple addresses if no IP_PKTINFO support Binding on multiple addresses for QUIC is safe only if IP_PKTINFO or equivalent is available. Else, the behavior may be undefined as the system is responsible to choose the network interface and source address on response. This commit adds a warning on boot if no or partial support for IP_PKTINFO or equivalent is detected and configuration contains UDP binding on multiple addresses. This should be backported up to 2.6. Special backport recommdations : * change ha_warning() to ha_diag_warning() to ensure no spurrious warnings will be triggered on stable releases * IP_PKTINFO usage was introduced on 2.7. For 2.6, multiple addresses QUIC binding is always unreliable. As such, preprocessor condition must simply be removed so that the warning is always active regarding of the system. Warning message should also be truncated to suppress IP_PKTINFO reference.	2024-02-20 16:40:14 +01:00
Amaury Denoyelle	86e5c607d1	MINOR: rhttp: mark reverse HTTP as experimental Mark the reverse HTTP feature as experimental. This will allow to adjust if needed the configuration mechanism with future developments without maintaining retro-compatibility. Concretely, each config directives linked to it now requires to specify first global expose-experimental-directives before. This is the case for the following directives : - rhttp@ prefix uses in bind and server lines - nbconn bind keyword - attach-srv tcp rule Each documentation section refering to these keywords are updated to highlight this new requirement. Note that this commit has duplicated on several places the code from the global function check_kw_experimental(). This is because the latter only work with cfg_keyword type. This is not adapted with bind_kw or action_kw types. This should be improve in a future patch.	2023-11-30 15:04:27 +01:00
Amaury Denoyelle	71ed381249	MINOR: listener: allow thread kw for rhttp bind Thanks to previous commit, a reverse HTTP listener is able to distribute actively opened connections accross its threads. To be able to exploit this, allow "thread" keyword for such a listener. An extra check is added to explicitely forbids a reverse bind to span multiple thread groups. Without this, multiple listeners instances will be created, each with its owned "nbconn" value. This may surprise users so for now, better to deactivate this possibility.	2023-11-23 17:46:00 +01:00
Amaury Denoyelle	55e78ff7e1	MINOR: rhttp: large renaming to use rhttp prefix Previous commit renames 'proto_reverse_connect' module to 'proto_rhttp'. This commits follows this by replacing various custom prefix by 'rhttp_' to make the code uniform. Note that 'reverse_' prefix was kept in connection module. This is because if a new reversable protocol not based on HTTP is implemented, it may be necessary to reused the same connection function which are protocol agnostic.	2023-11-23 17:40:01 +01:00
Frédéric Lécaille	028a55a1d0	MINOR: quic: Add a max window parameter to congestion control algorithms Add a new ->max_cwnd member to bind_conf struct to store the maximum congestion control window value for each QUIC binding. Modify the "quic-cc-algo" keyword parsing to add an optional parameter to its value: the maximum congestion window value between parentheses as follows: ex: quic-cc-algo cubic(10m) This value must be bounded, greater than 10k and smaller than 1g.	2023-11-13 17:53:18 +01:00
Amaury Denoyelle	7735cf3854	MEDIUM: quic: count quic_conn instance for maxconn Increment actconn and check maxconn limit when a quic_conn is instantiated. This is necessary because prior to this patch, quic_conn instances where not counted. Global actconn was only incremented after the handshake has been completed and the connection structure is allocated. The increment is done using increment_actconn() on INITIAL packet parsing if a new connection is about to be created. If the limit is reached, the allocation is cancelled and the INITIAL packet is dropped. The decrement is done under quic_conn_release(). This means that quic_cc_conn instances are not taken into account. This seems safe enough because quic_cc_conn are only used for minimal usage. The counterpart of this change is that maxconn must not be checked a second time when listener_accept() is done over a QUIC connection. For this, a new bind_conf flag BC_O_XPRT_MAXCONN is set for listeners when maxconn is already counted by the lower layer. For the moment, it is positionned only for QUIC listeners. Without this patch, haproxy process could suffer from heavy memory/CPU load if the number of concurrent handshake is high. This patch is not considered a bug fix per-se. However, it has a major benefit to protect against too many QUIC handshakes. As such, it should be backported up to 2.6. For this, it relies on the following patch : "MINOR: frontend: implement a dedicated actconn increment function"	2023-10-26 15:35:56 +02:00
Amaury Denoyelle	fffd435bbd	MINOR: frontend: implement a dedicated actconn increment function When a new frontend connection is instantiated, actconn global counter is incremented. If global maxconn value is reached, the connection is cancelled. This ensures that system limit are under control. Prior to this patch, the atomic check/increment operations were done directly into listener_accept(). Move them in a dedicated function increment_actconn() in frontend module. This will be useful when QUIC connections will be counted in actconn counter.	2023-10-26 15:18:48 +02:00
Amaury Denoyelle	f70cf28539	MINOR: listener: forbid most keywords for reverse HTTP bind Reverse HTTP bind is very specific in that in rely on a server to initiate connection. All connection settings are defined on the server line and ignored from the bind line. Before this patch, most of keywords were silently ignored. This could result in a configuration from doing unexpected things from the user point of view. To improve this situation, add a new 'rhttp_ok' field in bind_kw structure. If not set, the keyword is forbidden on a reverse bind line and will cause a fatal config error. For the moment, only the following keywords are usable with reverse bind 'id', 'name' and 'nbconn'. This change is safe as it's already forbidden to mix reverse and standard addresses on the same bind line.	2023-10-20 17:28:08 +02:00
Amaury Denoyelle	3222047a14	MINOR: listener: add nbconn kw for reverse connect Previously, maxconn keyword was reused for a specific usage on reverse HTTP binds to specify the number of active connect to proceed. To avoid confusion, introduce a new dedicated keyword 'nbconn' which is specific to reverse HTTP bind. This new keyword is forbidden for non-reverse listener. A fatal error is emitted during config parsing if this rule is not respected. It's safe because it's also forbidden to mix standard and reverse addresses on the same bind line. Internally, nbconn value will be reassigned to 'maxconn' member of bind_conf structure. This ensures that listener layer will automatically reenable the preconnect task each time a connection is closed.	2023-10-20 14:44:37 +02:00
Amaury Denoyelle	3ef6df7387	MINOR: quic: define quic-socket bind setting Define a new bind option quic-socket : quic-socket [ connection \| listener ] This new setting works in conjunction with the existing configuration global tune.quic.socket-owner and reuse the same semantics. The purpose of this setting is to allow to disable connection socket usage on listener instances individually. This will notably be useful when needing to deactivating it when encountered a fatal permission error on bind() at runtime.	2023-10-03 16:49:26 +02:00
Amaury Denoyelle	47f502df5e	MEDIUM: proto_reverse_connect: bootstrap active reverse connection Implement active reverse connection initialization. This is done through a new task stored in the receiver structure. This task is instantiated via bind callback and first woken up via enable callback. Task handler is separated into two halves. On the first step, a new connection is allocated and stored in <pend_conn> member of the receiver. This new client connection will proceed to connect using the server instance referenced in the bind_conf. When connect has successfully been executed and HTTP/2 connection is ready for exchange after SETTINGS, reverse_connect task is woken up. As <pend_conn> is still set, the second halve is executed which only execute listener_accept(). This will in turn execute accept_conn callback which is defined to return the pending connection. The task is automatically requeued inside accept_conn callback if bind maxconn is not yet reached. This allows to specify how many connection should be opened. Each connection is instantiated and reversed serially one by one until maxconn is reached. conn_free() has been modified to handle failure if a reverse connection fails before being accepted. In this case, no session exists to notify about the failure. Instead, reverse_connect task is requeud with a 1 second delay, giving time to fix a possible network issue. This will allow to attempt a new connection reverse. Note that for the moment connection rebinding after accept is disabled for simplicity. Extra operations are required to migrate an existing connection and its stack to a new thread which will be implemented later.	2023-08-24 17:03:06 +02:00
Amaury Denoyelle	0747e493a0	MINOR: proto_reverse_connect: parse rev@ addresses for bind Implement parsing for "rev@" addresses on bind line. On config parsing, server name is stored on the bind_conf. Several new callbacks are defined on reverse_connect protocol to complete parsing. listen callback is used to retrieve the server instance from the bind_conf server name. If found, the server instance is stored on the receiver. Checks are implemented to ensure HTTP/2 protocol only is used by the server.	2023-08-24 17:02:37 +02:00
Christopher Faulet	ff1c803279	BUG/MEDIUM: listener: Acquire proxy's lock in relax_listener() if necessary Listener functions must follow a common locking pattern: 1. Get the proxy's lock if necessary 2. Get the protocol's lock if necessary 3. Get the listener's lock if necessary We must take care to respect this order to avoid any ABBA issue. However, an issue was introduced in the commit bcad7e631 ("MINOR: listener: add relax_listener() function"). relax_listener() gets the lisener's lock and if resume_listener() is called, the proxy's lock is then acquired. So to fix the issue, the proxy's lock is first acquired in relax_listener(), if necessary. This patch should fix the issue #2222. It must be backported as far as 2.4 because the above commit is marked to be backported there.	2023-07-21 15:08:27 +02:00
Willy Tarreau	9615102b01	MINOR: stats: report the number of times the global maxconn was reached As discussed a few times over the years, it's quite difficult to know how often we stop accepting connections because the global maxconn was reached. This is not easy to know because when we reach the limit we stop accepting but we don't know if incoming connections are pending, so it's not possible to know how many were delayed just because of this. However, an interesting equivalent metric consist in counting the number of times an accepted incoming connection resulted in the limit being reached. I.e. "we've accepted the last one for now". That doesn't imply any other one got delayed but it's a factual indicator that something might have been delayed. And by counting the number of such events, it becomes easier to know whether some limits need to be adjusted because they're reached often, or if it's exceptionally rare. The metric is reported as a counter in show info and on the stats page in the info section right next to "maxconn".	2023-05-11 13:51:31 +02:00
Ilya Shipitsin	83f54b9aef	CLEANUP: src/listener.c: remove redundant NULL check fixes #2031 quoting Willy Tarreau: "Originally the listeners were intended to work without a bind_conf (e.g. for FTP processing) hence these tests, but over time the bind_conf has become omnipresent"	2023-05-11 05:30:03 +02:00
Willy Tarreau	7310164b2c	MINOR: listener: add a new global tune.listener.default-shards setting This new setting accepts "by-process", "by-group" and "by-thread" and will dictate how listeners will be sharded by default when nothing is specified. While the default remains "by-process", "by-group" should be much more efficient with many threads, while not changing anything for single-group setups.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c38499ceae	MINOR: listener: do not restrict CLI to first group anymore Now that we're able to run listeners on any set of groups, we don't need to maintain a special case about the stats socket anymore. It used to be forced to group 1 only so as to avoid startup failures in case several groups were configured, but if it's done now, it will automatically bind the needed FDs to have one per group so this is no more an issue.	2023-04-23 09:46:15 +02:00
Willy Tarreau	8a5e6f4cca	MINOR: protocol: add a function to check if some features are supported The new function protocol_supports_flag() checks the protocol flags to verify if some features are supported, but will support being extended to refine the tests. Let's use it to check for REUSEPORT.	2023-04-23 09:46:15 +02:00
Willy Tarreau	c1fbdd6397	MINOR: listener: automatically adjust shards based on support for SO_REUSEPORT Now if multiple shards are explicitly requested, and the listener's protocol doesn't support SO_REUSEPORT, sharding is disabled, which will result in the socket being automatically duped if needed. A warning is emitted when this happens. If "shards by-group" or "shards by-thread" are used, these will automatically be turned down to 1 since we want this to be possible easily using -dR on the command line without having to djust the config. For "by-thread", a diag warning will be emitted to help troubleshoot possible performance issues.	2023-04-23 09:46:15 +02:00
Willy Tarreau	a22db6567f	MEDIUM: peers: call bind_complete_thread_setup() to finish the config The listeners in peers sections were still not handing the thread groups fine. Shards were silently ignored and if a listener was bound to more than one group, it would simply fail. Now we can call the dedicated function to resolve all this and possibly create the missing extra listeners. bind_complete_thread_setup() was adjusted to use the proxy_type_str() instead of writing "proxy" at the only place where this word was still hard-coded so that we continue to speak about peers sections when relevant.	2023-04-23 09:46:15 +02:00
Willy Tarreau	f6a8444f55	REORG: listener: move the bind_conf's thread setup code to listener.c What used to be only two lines to apply a mask in a loop in check_config_validity() grew into a 130-line block that performs deeply listener-specific operations that do not have their place there anymore. In addition it's worth noting that the peers code still doesn't support shards nor being bound to more than one group, which is a second reason for moving that code to its own function. Nothing was changed except recreating the missing variables from the bind_conf itself (the fe only).	2023-04-23 09:46:15 +02:00
Tim Duesterhus	b1ec21d259	CLEANUP: Stop checking the pointer before calling `tasklet_free()` Changes performed with this Coccinelle patch: @@ expression e; @@ - if (e != NULL) { tasklet_free(e); - } @@ expression e; @@ - if (e) { tasklet_free(e); - } @@ expression e; @@ - if (e) tasklet_free(e); @@ expression e; @@ - if (e != NULL) tasklet_free(e); See GitHub Issue #2126	2023-04-23 00:28:25 +02:00
Willy Tarreau	8adffaa899	MINOR: listener: always compare the local thread as well By comparing the local thread's load with the least loaded thread's load, we can further improve the fairness and at the same time also improve locality since it allows a small ratio of connections not to be migrated. This is visible on CPU usage with long connections on very large thread counts (224) and high bandwidth (200G). The cost of checking the local thread's load remains fairly low so there's no reason not to do this. We continue to update the index if we select the local thread, because it means that the two other threads were both more loaded so we'd rather find better ones.	2023-04-21 17:41:26 +02:00
Willy Tarreau	ff18504d73	MINOR: listener: make sure to avoid ABA updates in per-thread index One limitation of the current thread index mechanism is that if the values are assigned multiple times to the same thread and the index loops, it can match again the old value, which will not prevent a competing thread from finishing its CAS and assigning traffic to a thread that's not the optimal one. The probability is low but the solution is simple enough and consists in implementing an update counter in the high bits of the index to force a mismatch in this case (assuming we don't try to cover for extremely unlikely cases where the update counter loops while the index remains equal). So let's do that. In order to improve the situation a little bit, we now set the index to a ulong so that in 32 bits we have 8 bits of counter and in 64 bits we have 40 bits.	2023-04-21 17:41:26 +02:00
Willy Tarreau	77e33509c8	MINOR: listener: resync with the thread index before heavy calculations During heavy accept competition, the CAS will occasionally fail and we'll have to go through all the calculation again. While the first two loops look heavy, they're almost never taken so they're quite cheap. However the rest of the operation is heavy because we have to consult connection counts and queue indexes for other threads, so better double-check if the index is still valid before continuing. Tests show that it's more efficient do retry half-way like this.	2023-04-21 17:41:26 +02:00
Willy Tarreau	b657492680	MINOR: listener: use a common thr_idx from the reference listener Instead of seeing each listener use its own thr_idx, let's use the same for all those from a shard. It should provide more accurate and smoother thread allocation.	2023-04-21 17:41:26 +02:00
Willy Tarreau	9d360604bd	MEDIUM: listener: rework thread assignment to consider all groups Till now threads were assigned in listener_accept() to other threads of the same group only, using a single group mask. Now that we have all the relevant info (array of listeners of the same shard), we can spread the thr_idx to cover all assigned groups. The thread indexes now contain the group number in their upper bits, and the indexes run over te whole list of threads, all groups included. One particular subtlety here is that switching to a thread from another group also means switching the group, hence the listener. As such, when changing the group we need to update the connection's owner to point to the listener of the same shard that is bound to the target group.	2023-04-21 17:41:26 +02:00
Willy Tarreau	e6f5ab5afa	MINOR: listener: make accept_queue index atomic There has always been a race when checking the length of an accept queue to determine which one is more loaded that another, because the head and tail are read at two different moments. This is not required, we can merge them as two 16 bit numbers inside a single 32-bit index that is always accessed atomically. This way we read both values at once and always have a consistent measurement.	2023-04-21 17:41:26 +02:00
Willy Tarreau	aae1810b4d	MINOR: receiver: add a struct shard_info to store info about each shard In order to create multiple receivers for one multi-group shard, we'll need some more info about the shard. Here we store: - the number of groups (= number of receivers) - the number of threads (will be used for accept LB) - pointer to the reference rx (to get the FD and to find all threads) - pointers to the other members (to iterate over all threads) For now since there's only one group per shard it remains simple. The listener deletion code already takes care of removing the current member from its shards list and moving others' reference to the last one if it was their reference (so as to avoid o(n^2) updates during ordered deletes). Since the vast majority of setups will not use multi-group shards, we try to save memory usage by only allocating the shard_info when it is needed, so the principle here is that a receiver shard_info==NULL is alone and doesn't share its socket with another group. Various approaches were considered and tests show that the management of the listeners during boot makes it easier to just attach to or detach from a shard_info and automatically allocate it if it does not exist, which is what is being done here. For now the attach code is not called, but detach is already called on delete.	2023-04-21 17:41:26 +02:00
Willy Tarreau	84fe1f479b	MINOR: listener: support another thread dispatch mode: "fair" This new algorithm for rebalancing incoming connections to multiple threads is simpler and instead of considering the threads load, it will only cycle through all of them, offering a fair share of the traffic to each thread. It may be well suited for short-lived connections but is also convenient for very large thread counts where it's not always certain that the least loaded thread will always be found.	2023-04-21 17:41:26 +02:00
Willy Tarreau	6a4d48b736	MINOR: quic_sock: index li->per_thr[] on local thread id, not global one There's a li_per_thread array in each listener for use with QUIC listeners. Since thread groups were introduced, this array can be allocated too large because global.nbthread is allocated for each listener, while only no more than MIN(nbthread,MAX_THREADS_PER_GROUP) may be used by a single listener. This was because the global thread ID is used as the index instead of the local ID (since a listener may only be used by a single group). Let's just switch to local ID and reduce the allocated size.	2023-04-21 17:41:26 +02:00
Amaury Denoyelle	0783a7b08e	MINOR: listener: remove unneeded local accept flag Remove the receiver RX_F_LOCAL_ACCEPT flag. This was used by QUIC protocol before thread rebinding was supported by the quic_conn layer. This should be backported up to 2.7 after the previous patch has also been taken.	2023-04-18 17:09:34 +02:00
Amaury Denoyelle	a66e04338e	MINOR: protocol: define new callback set_affinity Define a new protocol callback set_affinity. This function is used during listener_accept() to notify about a rebind on a new thread just before pushing the connection on the selected thread queue. If the callback fails, accept is done locally. This change will be useful for protocols with state allocated before accept is done. For the moment, only QUIC protocol is concerned. This will allow to rebind the quic_conn to a new thread depending on its load. This should be backported up to 2.7 after a period of observation.	2023-04-18 16:54:52 +02:00

1 2 3 4 5 ...

348 Commits