haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2026-03-16 04:22:07 +01:00

Author	SHA1	Message	Date
Willy Tarreau	a79a67b52f	OPTIM: server: get rid of the last use of _ha_barrier_full() The code in srv_add_to_idle_list() has its roots in 2.0 with commit 9ea5d361ae ("MEDIUM: servers: Reorganize the way idle connections are cleaned."). At this era we didn't yet have the current set of atomic load/store operations and we used to perform loads using volatile casts after a barrier. It turns out that this function has kept this schema over the years, resulting in a big mfence stalling all the pipeline in the function: \| static __inline void \| __ha_barrier_full(void) \| { \| __asm __volatile("mfence" ::: "memory"); 27.08 \| mfence \| if ((volatile void *)srv->idle_node.node.leaf_p == NULL) { 0.84 \| cmpq $0x0,0x158(%r15) 0.74 \| je 35f \| return 1; Switching these for a pair of atomic loads got rid of this and brought 0.5 to 3% extra performance depending on the tests due to variations elsewhere, but it has never been below 0.5%. Note that the second load doesn't need to be atomic since it's protected by the lock, but it's cleaner from an API and code review perspective. That's also why it's relaxed. This was the last user of _ha_barrier_full(), let's try not to reintroduce it now!	2026-01-28 16:07:27 +00:00
Amaury Denoyelle	6c0ea1fe73	MINOR: proxy: remove proxy_preset_defaults() Function proxy_preset_defaults() purpose has evolved over time. Originally, it was only used to initialize defaults proxies instances. Until today, it was extended so that all proxies use it. Its objective is to initialize settings to common default values. To remove the confusion, this function is now removed. Its content is integrated directly into init_new_proxy().	2026-01-22 16:20:25 +01:00
Aurelien DARRAGON	d38b918da1	BUG/MINOR: server: ensure server is detached from proxy list before being freed There remained some cases (on error paths) were a server could be freed while still attached on the parent proxy server list. In 3.3 this can be problematic because new_server() automatically adds the server to the parent proxy list. The bug is insignificant because it is on errors paths during init and often haproxy exits right after. But let's fix that to ensure no UAF or undefined behavior occurs because of that. This patch depends on ("MINOR: cli: use srv_drop() when server was created using new_server()") It must be backported in 3.3 with the above mentioned patch.	2026-01-19 14:24:04 +01:00
Aurelien DARRAGON	12dc9325a7	MINOR: cli: use srv_drop() when server was created using new_server() Now that new_server() is becoming more and more complex, we need to take care that servers created using new_server() must be released using the corresponding release function srv_drop() which takes care of properly de-initing the server and its members.	2026-01-19 14:23:58 +01:00
Olivier Houchard	7f4b053b26	MEDIUM: counters: mostly revert da813ae4d7cb77137ed Contrarily to what was previously believed, there are corner cases where the counters may not be allocated, and we may want to make them optional at a later date, so we have to check if those counters are there. However, just checking that shared.tg is non-NULL is enough, we can then assume that shared.tg[tgid - 1] has properly been allocated too. Also modify the various COUNTER_SHARED_* macros to make sure they check for that too.	2026-01-14 12:39:14 +01:00
Olivier Houchard	da813ae4d7	MEDIUM: counters: Remove some extra tests Before updating counters, a few tests are made to check if the counters exits. but those counters should always exist at this point, so just remmove them. This commit should have no impact, but can easily be reverted with no functional impact if various crashes appear.	2026-01-13 11:12:34 +01:00
Olivier Houchard	5495c88441	MEDIUM: counters: Dynamically allocate per-thread group counters Instead of statically allocating the per-thread group counters, based on the max number of thread groups available, allocate them dynamically, based on the number of thread groups actually used. That way we can increase the maximum number of thread groups without using an unreasonable amount of memory.	2026-01-13 11:12:34 +01:00
Amaury Denoyelle	47dff5be52	MINOR: quic: implement cc-algo server keyword Extend QUIC server configuration so that congestion algorithm and maximum window size can be set on the server line. This can be achieved using quic-cc-algo keyword with a syntax similar to a bind line. This should be backported up to 3.3 as this feature is considered as necessary for full QUIC backend support. Note that this relies on the serie of previous commits which should be picked first.	2025-12-01 15:53:58 +01:00
Amaury Denoyelle	a363b536a9	BUG/MINOR: server: fix srv_drop() crash on partially init srv A recent patch has introduced free operation for QUIC tokens stored in a server. These values are located in <per_thr> server array. However, a server instance may be released prior to its full initialization in case of a failure during "add server" CLI command. The mentionned patch would cause a srv_drop() crash due to an invalid usage of NULL <per_thr> member. Fix this by adding a check on <per_thr> prior to dereference it in srv_drop(). No need to backport.	2025-11-25 15:16:13 +01:00
Amaury Denoyelle	4b596c1ea8	BUG/MINOR: quic/server: free quic_retry_token on srv drop A recent patch has implemented caching of QUIC token received from a NEW_TOKEN frame into the server cache. This value is stored per thread into a <quic_retry_token> field. This field is an ist, first set to an empty string. Via qc_try_store_new_token(), it is reallocated to fit the size of the newly stored token. Prior to this patch, the field was never freed so this causes a memory leak. Fix this by using istfree() on <quic_retry_token> field during srv_drop(). No need to backport.	2025-11-25 14:30:18 +01:00
Willy Tarreau	4a6dec7193	DEBUG: servers: add a few checks for stress-testing idle conns The latest idle conns fix 9481cef948 ("BUG/MEDIUM: connection: do not reinsert a purgeable conn in idle list") addresses a very hard-to-hit case which manifests itself with an attempt to reuse a connection fails because conn->mux is NULL: Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000655410b8642c in conn_backend_get (reuse_mode=4, srv=srv@entry=0x6554378a7140, sess=sess@entry=0x7cfe140948a0, is_safe=is_safe@entry=0, hash=hash@entry=910818338996668161) at src/backend.c:1390 1390 if (conn->mux->takeover && conn->mux->takeover(conn, i, 0) == 0) { However the condition that leads to this situation can be detected earlier, by the presence of the connection in the toremove_list, whose race window is much larger and easier to detect. This patch adds a few BUG_ON_STRESS() at selected places that an detect this condition. When built with -DDEBUG_STRESS and run under stress with two distinct processes communicating over H2 over SSL, under a stress of 400-500k req/s, the front process usually crashes in the first 10-30s triggering in _srv_add_idle() if the fix above is reverted (and it does not crash with the fix). This is mainly included to serve as an illustration of how to instrument the code for seamless stress testing.	2025-11-14 17:00:17 +01:00
Amaury Denoyelle	d79295d89b	Revert "BUG/MEDIUM: connections: permit to permanently remove an idle conn" The target patch fixes a rare race condition which happen when a MUX IO handler is working on a connection already moved into the purge list. In this case, the handler will incorrectly moved back the connection into the idle list. To fix this, conn_delete_from_tree() was extended to remove flags along with the connection from the idle list. This was performed when the connection is moved into the purge list. However, it introduces another issue related to the idle server connection accounting. Thus it is necessary to revert it prior to the incoming newer fix. This patch must be backported to every version where the original commit is.	2025-11-14 16:06:34 +01:00
Willy Tarreau	0144426dfb	BUG/MEDIUM: server: close a race around ready_srv when deleting a server When a server is being disabled or deleted, in case it matches the backend's ready_srv, this one is reset. However it's currently done in a non-atomic way when the server goes down, and that could occasionally reset the entry matching another server, but more importantly if in parallel some requests are dequeued for that server, it may re-appear there after having been removed, leading to a possible crash once it is fully removed, as shown in issue #3177. Let's make sure we reset the pointer when detaching the server from the proxy, and use a CAS in both cases to only reset this server. This fix needs to be backported to 3.2. There, srv_detach() is in server.c instead of server.h. Thanks to Basha Mougamadou for the detailed report and the useful backtraces.	2025-11-06 19:57:44 +01:00
Willy Tarreau	5fe4677231	MINOR: server: move the lock inside srv_add_idle() Almost all callers of _srv_add_idle() lock the list then call the function. It's not the most efficient and it requires some care from the caller to take care of that lock. Let's change this a little bit by having srv_add_idle() that takes the lock and calls _srv_add_idle() that is now inlined. This way callers don't have to handle the lock themselves anymore, and the lock is only taken around the sensitive parts, not the function call+return. Interestingly, perf tests show a small perf increase from 2.28-2.32M RPS to 2.32-2.37M RPS on a 128-thread system.	2025-11-06 13:16:24 +01:00
Willy Tarreau	096999ee20	BUG/MEDIUM: connections: permit to permanently remove an idle conn There's currently a function conn_delete_from_tree() which is used to detach an idle connection from the tree it's currently attached to so that it is no longer found. This function is used in three circumstances: - when picking a new connection that no longer has any avail stream - when temporarily working on the connection from an I/O handler, in which case it's re-added at the end - when killing a connection The 2nd case above is quite specific, as it requires to preserve the CO_FL_LIST_MASK flags so that the connection can be re-inserted into the proper tree when leaving the handler. However, there's a catch. When killing a connection, we want to be certain it will not be reinserted into the tree. The flags preservation is causing a tiny race if an I/O happens while the connection is in the kill list, because in this case the I/O handler will note the connection flags, do its work, then reinsert the connection where it believed it was, then the connection gets purged, and another user can find it in the tree. The issue is very difficult to reproduce. On a 128-thread machine it happens in H2 around 500k req/s after around 50M requests. In H1 it happens after around 1 billion requests. The fix here consists in passing an extra argument to the function to indicate if the removal is permanent or not. When it's permanent, the function will clear the associated flags. The callers were adjusted so that all those dequeuing a connection in order to kill it do it permanently and all other ones do it only temporarily. A slightly different approach could have worked: the function could always remove all flags, and the callers would need to restore them. But this would require trickier modifications of the various call places, compared to only passing 0/1 to indicate the permanent status. This will need to be backported to all stable versions. The issue was at least reproduced since 3.1 (not tested before). The patch will need to be adjusted for 3.2 and older, because a 2nd argument "thr" was added in 3.3, so the patch will not apply to older versions as-is.	2025-11-05 11:08:25 +01:00
Olivier Houchard	06821dc189	BUG/MEDIUM: server: Also call srv_reset_path_parameters() on srv up Also call srv_reset_path_parameters() when the server changed states, and got up. It is not enough to do it when the server goes down, because there's a small race condition, and a connection could get established just after we did it, and could have set the path parameters. This does not need to be backported.	2025-11-04 18:47:34 +01:00
Olivier Houchard	7d4aa7b22b	BUG/MEDIUM: server: Add a rwlock to path parameter Add a rwlock to control the server's path_parameter, to make sure multiple threads don't set it at the same time, and it can't be seen in an inconsistent state. Also don't set the parameter every time, only set them if they have changed, to prevent needless writes. This does not need to be backported.	2025-11-04 18:47:34 +01:00
Amaury Denoyelle	73b5d331cc	OPTIM: quic: adjust automatic ALPN setting for QUIC servers If a QUIC server is declared without ALPN, "h3" value is automatically set during _srv_parse_finalize(). This patch adjusts this operation. Instead of relying on ssl_sock_parse_alpn(), a plain strdup() is used. This is considered more efficient as the ALPN string is constant in this case. This method is already used for listeners on the frontend side.	2025-10-31 11:32:20 +01:00
Amaury Denoyelle	14a6468df5	MINOR: quic: reject conf with QUIC servers if not compiled Ensure that QUIC support is compiled into haproxy when a QUIC server is configured. This check is performed during _srv_parse_finalize() so that it is detected both on configuration parsing and when adding a dynamic server via the CLI. Note that this changes the behavior of srv_is_quic() utility function. Previously, it always returned false when QUIC support wasn't compiled. With this new check introduced, it is now guaranteed that a QUIC server won't exist if compilation support is not active. Hence srv_is_quic() does not rely anymore on USE_QUIC define.	2025-10-31 11:32:20 +01:00
Amaury Denoyelle	1af3caae7d	MINOR: quic: enable SSL on QUIC servers automatically Previously, QUIC servers were rejected if SSL was not explicitely activated using 'ssl' configuration keyword. Change this behavior : now SSL is automatically activated for QUIC servers when the keyword is missing. A warning is displayed as it is considered better to explicitely note that SSL is in use.	2025-10-31 11:32:14 +01:00
Willy Tarreau	80ed9f9dcf	MINOR: tree-wide: add missing TAINTED flags for some experimental directives We normally taint the process when using experimental directives, but a handful of places were missed so we don't always know that they are in use. Let's fix these places (hint for future directives, just look for places checking for "experimental_directives_allowed", and add "mark_tainted(TAINTED_CONFIG_EXP_KW_DECLARED);").	2025-10-17 19:00:21 +02:00
Olivier Houchard	822ee90dc2	MEDIUM: servers: Schedule the server requeue target on creation On creation, schedule the server requeue once it's been created. It is possible that when the server went up, it tried to queue itself into the lb specific code, failed to do so, and expect the tasklet to run to take care of that. This should be backported to 3.2. This is part of an attempt to fix github issue #3143.	2025-10-01 18:13:33 +02:00
Aurelien DARRAGON	5c299dee5a	MEDIUM: stats: consider that shared stats pointers may be NULL This patch looks huge, but it has a very simple goal: protect all accessed to shared stats pointers (either read or writes), because we know consider that these pointers may be NULL. The reason behind this is despite all precautions taken to ensure the pointers shouldn't be NULL when not expected, there are still corner cases (ie: frontends stats used on a backend which no FE cap and vice versa) where we could try to access a memory area which is not allocated. Willy stumbled on such cases while playing with the rings servers upon connection error, which eventually led to process crashes (since 3.3 when shared stats were implemented) Also, we may decide later that shared stats are optional and should be disabled on the proxy to save memory and CPU, and this patch is a step further towards that goal. So in essence, this patch ensures shared stats pointers are always initialized (including NULL), and adds necessary guards before shared stats pointers are de-referenced. Since we already had some checks for backends and listeners stats, and the pointer address retrieval should stay in cpu cache, let's hope that this patch doesn't impact stats performance much.	2025-09-18 16:49:51 +02:00
Willy Tarreau	8c077c17eb	MINOR: server: add the "cc" keyword to set the TCP congestion controller It is possible on at least Linux and FreeBSD to set the congestion control algorithm to be used with outgoing connections, among the list of supported and permitted ones. Let's expose this setting with "cc". Unknown or forbidden algorithms will be ignored and the default one will continue to be used.	2025-09-17 17:19:33 +02:00
Willy Tarreau	2d6b5c7a60	MEDIUM: connection: reintegrate conn_hash_node into connection Previously the conn_hash_node was placed outside the connection due to the big size of the eb64_node that could have negatively impacted frontend connections. But having it outside also means that one extra allocation is needed for each backend connection, and that one memory indirection is needed for each lookup. With the compact trees, the tree node is smaller (16 bytes vs 40) so the overhead is much lower. By integrating it into the connection, We're also eliminating one pointer from the connection to the hash node and one pointer from the hash node to the connection (in addition to the extra object bookkeeping). This results in saving at least 24 bytes per total backend connection, and only inflates connections by 16 bytes (from 240 to 256), which is a reasonable compromise. Tests on a 64-core EPYC show a 2.4% increase in the request rate (from 2.08 to 2.13 Mrps).	2025-09-16 09:23:46 +02:00
Willy Tarreau	ceaf8c1220	MEDIUM: connection: move idle connection trees to ceb64 Idle connection trees currently require a 56-byte conn_hash_node per connection, which can be reduced to 32 bytes by moving to ceb64. While ceb64 is theoretically slower, in practice here we're essentially dealing with trees that almost always contain a single key and many duplicates. In this case, ceb64 insert and lookup functions become faster than eb64 ones because all duplicates are a list accessed in O(1) while it's a subtree for eb64. In tests it is impossible to tell the difference between the two, so it's worth reducing the memory usage. This commit brings the following memory savings to conn_hash_node (one per backend connection), and to srv_per_thread (one per thread and per server): struct before after delta conn_hash_nodea 56 32 -24 srv_per_thread 96 72 -24 The delicate part is conn_delete_from_tree(), because we need to know the tree root the connection is attached to. But thanks to recent cleanups, it's now clear enough (i.e. idle/safe/avail vs session are easy to distinguish).	2025-09-16 09:23:46 +02:00
Willy Tarreau	95b8adff67	MINOR: connection: pass the thread number to conn_delete_from_tree() We'll soon need to choose the server's root based on the connection's flags, and for this we'll need the thread it's attached to, which is not always the current one. This patch simply passes the thread number from all callers. They know it because they just set the idle_conns lock on it prior to calling the function.	2025-09-16 09:23:46 +02:00
Willy Tarreau	efe519ab89	CLEANUP: backend: use a single variable for removed in srv_cleanup_idle_conns() Probably due to older code, there's a boolean variable used to set another one which is then checked. Also the first check is made under the lock, which is unnecessary. Let's simplify this and use a single variable. This only makes the code clearer, it doesn't change the output code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	f7d1fc2b08	MINOR: server: pass the server and thread to srv_migrate_conns_to_remove() We'll need to have access to the srv_per_thread element soon from this function, and there's no particular reason for passing it list pointers so let's pass the server and the thread so that it is autonomous. It also makes the calling code simpler.	2025-09-16 09:23:46 +02:00
Willy Tarreau	d1c5df6866	CLEANUP: server: use eb64_entry() not ebmb_entry() to convert an eb64 There were a few leftovers from an earlier version of the conn_hash_node that was using ebmb nodes. A few calls to ebmb_first() and ebmb_entry() were still present while acting on an eb64 tree. These are harmless as one is just eb_first() and the other container_of(), but it's confusing so let's clean them up.	2025-09-16 09:23:46 +02:00
Willy Tarreau	d18d972b1f	MEDIUM: server: index server ID using compact trees The server ID is currently stored as a 32-bit int using an eb32 tree. It's used essentially to find holes in order to automatically assign IDs, and to detect duplicates. Let's change this to use compact trees instead in order to save 24 bytes in struct server for this node, plus 8 bytes in struct proxy. The server struct is still 3904 bytes large (due to alignment) and the proxy struct is 3072.	2025-09-16 09:23:46 +02:00
Willy Tarreau	5a5cec4d7a	MINOR: server: add server_index_id() to index a server by its ID This avoids needlessly exposing the tree's root and the mechanics outside of the low-level code.	2025-09-16 09:23:46 +02:00
Willy Tarreau	4ed4cdbf3d	CLEANUP: server: use server_find_by_id() when looking for already used IDs In srv_parse_id(), there's no point doing all the low-level work with the tree functions to check for the existence of an ID, we already have server_find_by_id() which does exactly this, so let's use it.	2025-09-16 09:23:46 +02:00
Willy Tarreau	0b0aefe19b	MINOR: server: add server_get_next_id() to find next free server ID This was previously achieved via the generic get_next_id() but we'll soon get rid of generic ID trees so let's have a dedicated server_get_next_id(). As a bonus it reduces the exposure of the tree's root outside of the functions.	2025-09-16 09:23:46 +02:00
Willy Tarreau	413e903a22	MEDIUM: server: switch conf.name to cebis_tree This is used to index the server name and it contains a copy of the pointer to the server's name in <id>. Changing that for a ceb_node placed just before <id> saves 32 bytes to the struct server, which remains 3968 bytes large due to alignment. The proxy struct shrinks by 8 bytes to 3144. It's worth noting that the current way duplicate names are handled remains based on the previous mechanism where dups were permitted. Ideally we should now reject them during insertion and use unique key trees instead.	2025-09-16 09:23:46 +02:00
Willy Tarreau	0e99f64fc6	MEDIUM: server: switch addr_node to cebis_tree This contains the text representation of the server's address, for use with stick-tables with "srvkey addr". Switching them to a compact node saves 24 more bytes from this structure. The key was moved to an external pointer "addr_key" right after the node. The server struct is now 3968 bytes (down from 4032) due to alignment, and the proxy struct shrinks by 8 bytes to 3152.	2025-09-16 09:23:46 +02:00
Olivier Houchard	ff47ae60f3	MEDIUM: server: Introduce the concept of path parameters Add a new field in struct server, path parameters. It will contain connection informations for the server that are not expected to change. For now, just store the ALPN negociated with the server. Each time an handhskae is done, we'll update it, even though it is not supposed to change. This will be useful when trying to send early data, that way we'll know which mux to use. Each time the server goes down or is disabled, those informations are erased, as we can't be sure those parameters will be the same once the server will be back up.	2025-09-09 19:01:24 +02:00
Christopher Faulet	668916c1a2	MEDIUM: server/ssl: Base the SNI value to the HTTP host header by default For HTTPS outgoing connections, the SNI is now automatically set using the Host header value if no other value is already set (via the "sni" server keyword). It is now the default behavior. It could be disabled with the "no-sni-auto" server keyword. And eventually "sni-auto" server keyword may be used to reset any previous "no-sni-auto" setting. This option can be inherited from "default-server" settings. Finally, if no connection name is set via "pool-conn-name" setting, the selected value is used. The automatic selection of the SNI is enabled by default for all outgoing connections. But it is concretely used for HTTPS connections only. The expression used is "req.hdr(host),host_only". This patch should paritally fix the issue #3081. It only covers the server part. Another patch will add the feature for HTTP health-checks.	2025-09-05 15:56:42 +02:00
Christopher Faulet	a97bd0f505	BUG/MINOR: server: Update healthcheck when server settings are changed via CLI not all changes are concerned. But when the SSL is enabled or disabled for a server, the healthcheck xprt must be eventually be updated too. This happens when the healthcheck relies on the server settings. In the same spirit, when the healthcheck address and port are updated, we must fallback on the raw xprt if the SSL is not explicitly enabled for the healthcheck with a "check-ssl" parameter. This patch should be backported to all stable versions.	2025-09-05 15:56:42 +02:00
Christopher Faulet	f8f94ffc9c	BUG/MEDIUM: server: Use sni as pool connection name for SSL server only By default, for a given server, when no pool-conn-name is specified, the configured sni is used. However, this must only be done when SSL is in-use for the server. Of course, it is uncommon to have a sni expression for now-ssl server. But this may happen. In addition, the SSL may be disabled via the CLI. In that case, the pool-conn-name must be discarded if it was copied from the sni. And, we must of course take care to set it if the ssl is enabled. Finally, when the attac-srv action is checked, we now checked the pool-conn-name expression. This patch should be backported as far as 3.0. It relies on "MINOR: server: Parse sni and pool-conn-name expressions in a dedicated function" which should be backported too.	2025-09-05 15:56:08 +02:00
Christopher Faulet	086a248645	MINOR: server: Parse sni and pool-conn-name expressions in a dedicated function This change is mandatory to fix an issue. The parsing of sni and pool-conn-name expressions (from string to expression) is now handled in a dedicated function. This will avoid to duplicate the same code at different places.	2025-09-05 11:32:21 +02:00
Aurelien DARRAGON	cb08bcb9d6	MINOR: counters: retrieve detailed errmsg upon failure with counters_{fe,be}_shared_prepare() counters_{fe,be}_shared_prepare now take an extra <errmsg> parameter that contains additional hints about the error in case of failure. It must be freed accordingly since it is allocated using memprintf	2025-09-03 15:59:17 +02:00
Christopher Faulet	f8b7299ee7	BUG/MINOR: server: Duplicate healthcheck's sni inherited from default server It is not really an issue, but the "check-sni" value inerited from a default server is not duplicated while the paramter value is duplicated during the parsing. So here there is a small leak if several "check-sni" parameters are used on the same server line. The previous value is never released. But to fix this issue, the value inherited from the default server must also be duplicated. At the end it is safer this way and consistant with the parsing of the "sni" parameter. It is harmless so there is no reason to backport this patch.	2025-09-01 15:45:05 +02:00
Christopher Faulet	f7a04b428a	BUG/MEDIUM: server: Duplicate healthcheck's alpn inherited from default server When "check-alpn" parameter is inherited from the default server, the value is not duplicated, the pointer of the default server is used. However, when this parameter is overridden, the old value is released. So the "check-alpn" value of the default server is released. So it is possible to have a UAF if if another server inherit from the same the default server. To fix the issue, the "check-alpn" parameter must be handled the same way the "alpn" is. The default value is duplicated. So it could be safely released if it is forced on the server line. This patch should fix the issue #3096. It must be backported to all stable versions.	2025-09-01 15:45:05 +02:00
Amaury Denoyelle	7232677385	MAJOR: server: do not remove idle conns in del server Do not remove anymore idle and purgeable connections directly under the "del server" handler. The main objective of this patch is to reduce the amount of work performed under thread isolation. This should improve "del server" scheduling with other haproxy tasks. Another objective is to be able to properly support dynamic servers with QUIC. Indeed, takeover is not yet implemented for this protocol, hence it is not possible to rely on cleanup of idle connections performed by a single thread under "del server" handler. With this change it is not possible anymore to remove a server if there is still idle connections referencing it. To ensure this cannot be performed, srv_check_for_deletion() has been extended to check server counters for idle and idle private connections. Server deletion should still remain a viable procedure, as first it is mandatory to put the targetted server into maintenance. This step forces the cleanup of its existing idle connections. Thanks to a recent change, all finishing connections are also removed immediately instead of becoming idle. In short, this patch transforms idle connections removal from a synchronous to an asynchronous procedure. However, this should remain a steadfast and quick method achievable in less than a second. This patch is considered major as some users may notice this change when removing a server. In particular with the following CLI commands pipeline: "disable server <X>; shutdown sessions server <X>; del server <X>" Server deletion will now probably fail, as idle connections purge cannot be completed immediately. Thus, it is now highly advise to always use a small delay "wait srv-removable" before "del server" to ensure that idle connections purge is executed prior. Along with this change, documentation for "del server" and related "shutdown sessions server" has been refined, in particular to better highlight under what conditions a server can be removed.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	dbe31e3f65	MEDIUM: session: account on server idle conns attached to session This patch adds a new member <curr_sess_idle_conns> on the server. It serves as a counter of idle connections attached on a session instead of regular idle/safe trees. This is used only for private connections. The objective is to provide a method to detect if there is idle connections still referencing a server. This will be particularly useful to ensure that a server is removable. Currently, this is not yet necessary as idle connections are directly freed via "del server" handler under thread isolation. However, this procedure will be replaced by an asynchronous mechanism outside of thread isolation. Careful: connections attached to a session but not idle will not be accounted by this counter. These connections can still be detected via srv_has_streams() so "del server" will be safe. This counter is maintain during the whole lifetime of a private connection. This is mandatory to guarantee "del server" safety and is conform with other idle server counters. What this means it that decrement is performed only when the connection transitions from idle to in use, or just prior to its deletion. For the first case, this is covered by session_get_conn(). The second case is trickier. It cannot be done via session_unown_conn() as a private connection may still live a little longer after its removal from session, most notably when scheduled for idle purging. Thus, conn_free() has been adjusted to handle the final decrement. Now, conn_backend_deinit() is also called for private connections if CO_FL_SESS_IDLE flag is present. This results in a call to srv_release_conn() which is responsible to decrement server idle counters.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	7a6e3c1a73	MAJOR: server: implement purging of private idle connections When a server goes into maintenance, or if its IP address is changed, idle connections attached to it are scheduled for deletion via the purge mechanism. Connections are moved from server idle/safe list to the purge list relative to their thread. Connections are freed on their owned thread by the scheduled purge task. This patch extends this procedure to also handle private idle connections stored in sessions instead of servers. This is possible thanks via <sess_conns> list server member. A call to the newly defined-function session_purge_conns() is performed on each list element. This moves private connections from their session to the purge list alongside other server idle connections. This change relies on the serie of previous commits which ensure that access to private idle connections is now thread-safe, with idle_conns lock usage and careful manipulation of private idle conns in input/output handlers. The main benefit of this patch is that now all idle connections targetting a server set in maintenance are removed. Previously, private connections would remain until their attach sessions were closed.	2025-08-28 15:08:35 +02:00
Amaury Denoyelle	b18b5e2f74	MINOR: server: cleanup idle conns for server in maint already stopped When a server goes into maintenance mode, its idle connections are scheduled for an immediate purge. However, this is not the case if the server is already in stopped state, for example due to a health check failure. Adjust _srv_update_status_adm() to ensure that idle connections are always scheduled for purge when going into maintenance in both cases. The main advantage of this patch is to ensure consistent behavior for server maintenance mode. Note that it will also become necessary as server deletion will be adjusted with a future patch. Idle connection closure won't be performed by "del server" handler anymore, so it's important to ensure that a full cleanup is always performed prior to executing it, else the server may not be removable during a certain delay.	2025-08-28 14:55:21 +02:00
Amaury Denoyelle	67df6577ff	MEDIUM: server: close new idle conns if server in maintenance Currently, when a server is set on maintenance mode, its idle connection are scheduled for purge. However, this does not prevent currently used connection to become idle later on, even if the server is still off. Change this behavior : an idle connection is now rejected by the server if it is in maintenance. This is implemented with a new condition in srv_add_to_idle_list() which returns an error value. In this case, muxes stream detach callback will immediately free the connection. A similar change is also performed in each MUX and SSL I/O handlers and in conn_notify_mux(). An idle connection is not reinserted in its idle list if server is in maintenance, but instead it is immediately freed.	2025-08-28 14:55:18 +02:00
Amaury Denoyelle	f234b40cde	MINOR: server: shard by thread sess_conns member Server member <sess_conns> is a mt_list which contains every backend connections attached to a session which targets this server. These connecions are not present in idle server trees. The main utility of this list is to be able to cleanup these connections prior to removing a server via "del server" CLI. However, this procedure will be adjusted by a future patch. As such, <sess_conns> member must be moved into srv_per_thread struct. Effectively, this duplicates a list for every threads. This commit does not introduce functional change. Its goal is to ensure that these connections are now ordered by their owning thread, which will allow to implement a purge, similarly to idle connections attached to servers.	2025-08-28 14:52:29 +02:00

1 2 3 4 5 ...

868 Commits