MAJOR: server: do not remove idle conns in del server

Do not remove anymore idle and purgeable connections directly under the
"del server" handler. The main objective of this patch is to reduce the
amount of work performed under thread isolation. This should improve
"del server" scheduling with other haproxy tasks.

Another objective is to be able to properly support dynamic servers with
QUIC. Indeed, takeover is not yet implemented for this protocol, hence
it is not possible to rely on cleanup of idle connections performed by a
single thread under "del server" handler.

With this change it is not possible anymore to remove a server if there
is still idle connections referencing it. To ensure this cannot be
performed, srv_check_for_deletion() has been extended to check server
counters for idle and idle private connections.

Server deletion should still remain a viable procedure, as first it is
mandatory to put the targetted server into maintenance. This step forces
the cleanup of its existing idle connections. Thanks to a recent change,
all finishing connections are also removed immediately instead of
becoming idle. In short, this patch transforms idle connections removal
from a synchronous to an asynchronous procedure. However, this should
remain a steadfast and quick method achievable in less than a second.

This patch is considered major as some users may notice this change when
removing a server. In particular with the following CLI commands
pipeline:
  "disable server <X>; shutdown sessions server <X>; del server <X>"

Server deletion will now probably fail, as idle connections purge cannot
be completed immediately. Thus, it is now highly advise to always use a
small delay "wait srv-removable" before "del server" to ensure that idle
connections purge is executed prior.

Along with this change, documentation for "del server" and related
"shutdown sessions server" has been refined, in particular to better
highlight under what conditions a server can be removed.
This commit is contained in:
Amaury Denoyelle 2025-08-01 17:51:16 +02:00
parent dbe31e3f65
commit 7232677385
2 changed files with 30 additions and 78 deletions

View File

@ -2085,11 +2085,19 @@ del ssl crt-list <filename> <certfile[:line]>
numbers, use "show ssl crt-list -n <crtlist>". numbers, use "show ssl crt-list -n <crtlist>".
del server <backend>/<server> del server <backend>/<server>
Remove a server attached to the backend <backend>. All servers are eligible, Delete a removable server attached to the backend <backend>. A removable
except servers which are referenced by other configuration elements. The server is the server which satisfies all of these conditions :
server must be put in maintenance mode prior to its deletion. The operation - not referenced by other configuration elements
is cancelled if the server still has active or idle connection or its - must already be in maintenance (see "disable server")
connection queue is not empty. - must not have any active or idle connections
If any of these conditions is not met, the command will fail.
Active connections are those with at least one ongoing request. It is
possible to speed up their termination using "shutdown sessions server". It
is highly recommended to use "wait srv-removable" before "del server" to
ensure that all active or idle connections are closed and that the command
succeeds.
disable agent <backend>/<server> disable agent <backend>/<server>
Mark the auxiliary agent check as temporarily stopped. Mark the auxiliary agent check as temporarily stopped.
@ -4097,6 +4105,10 @@ shutdown sessions server <backend>/<server>
maintenance mode, for instance. Such terminated streams are reported with a maintenance mode, for instance. Such terminated streams are reported with a
'K' flag in the logs. 'K' flag in the logs.
Backend connections are left in idle state, unless the server is already in
maintenance mode, in which case they will be immediately scheduled for
deletion.
trace trace
The "trace" command alone lists the trace sources, their current status, and The "trace" command alone lists the trace sources, their current status, and
their brief descriptions. It is only meant as a menu to enter next levels, their brief descriptions. It is only meant as a menu to enter next levels,
@ -4311,12 +4323,13 @@ wait { -h | <delay> } [<condition> [<args>...]]
unsatisfied for the whole <delay> duration. The supported conditions are: unsatisfied for the whole <delay> duration. The supported conditions are:
- srv-removable <proxy>/<server> : this will wait for the specified server to - srv-removable <proxy>/<server> : this will wait for the specified server to
be removable, i.e. be in maintenance and no longer have any connection on be removable by the "del server" command, i.e. be in maintenance and no
it. Some conditions will never be accepted (e.g. not in maintenance) and longer have any connection on it (neither active or idle). Some conditions
will cause the report of a specific error message indicating what condition will never be accepted (e.g. not in maintenance) and will cause the report
is not met. The server might even have been removed in parallel and no of a specific error message indicating what condition is not met. The
longer exit. If everything is OK before the delay, a success is returned server might even have been removed in parallel and no longer exit. If
and the operation is terminated. everything is OK before the delay, a success is returned and the operation
is terminated.
The default unit for the delay is milliseconds, though other units are The default unit for the delay is milliseconds, though other units are
accepted if suffixed with the usual timer units (us, ms, s, m, h, d). When accepted if suffixed with the usual timer units (us, ms, s, m, h, d). When

View File

@ -6335,9 +6335,12 @@ int srv_check_for_deletion(const char *bename, const char *svname, struct proxy
/* Second, conditions that may change over time */ /* Second, conditions that may change over time */
ret = 0; ret = 0;
/* Ensure that there is no active/pending connection on the server. */ /* Ensure that there is no active/pending/idle connection on the server.
* Note that idle conns scheduled for purging are still accounted in idle counter.
*/
if (_HA_ATOMIC_LOAD(&srv->curr_used_conns) || if (_HA_ATOMIC_LOAD(&srv->curr_used_conns) ||
_HA_ATOMIC_LOAD(&srv->queueslength) || srv_has_streams(srv)) { _HA_ATOMIC_LOAD(&srv->queueslength) || srv_has_streams(srv) ||
_HA_ATOMIC_LOAD(&srv->curr_idle_conns) || _HA_ATOMIC_LOAD(&srv->curr_sess_idle_conns)) {
msg = "Server still has connections attached to it, cannot remove it."; msg = "Server still has connections attached to it, cannot remove it.";
goto leave; goto leave;
} }
@ -6362,11 +6365,9 @@ static int cli_parse_delete_server(char **args, char *payload, struct appctx *ap
struct proxy *be; struct proxy *be;
struct server *srv; struct server *srv;
struct ist be_name, sv_name; struct ist be_name, sv_name;
struct mt_list back;
struct sess_priv_conns *sess_conns = NULL;
struct watcher *srv_watch; struct watcher *srv_watch;
const char *msg; const char *msg;
int ret, i; int ret;
if (!cli_has_level(appctx, ACCESS_LVL_ADMIN)) if (!cli_has_level(appctx, ACCESS_LVL_ADMIN))
return 1; return 1;
@ -6395,68 +6396,6 @@ static int cli_parse_delete_server(char **args, char *payload, struct appctx *ap
goto out; goto out;
} }
/* Close idle connections attached to this server. */
for (i = tid;;) {
struct list *list = &srv->per_thr[i].idle_conn_list;
struct connection *conn;
while (!LIST_ISEMPTY(list)) {
conn = LIST_ELEM(list->n, struct connection *, idle_list);
if (i != tid) {
if (conn->mux && conn->mux->takeover)
conn->mux->takeover(conn, i, 1);
else if (conn->xprt && conn->xprt->takeover)
conn->xprt->takeover(conn, conn->ctx, i, 1);
}
conn_release(conn);
}
/* Also remove all purgeable conns as some of them may still
* reference the currently deleted server.
*/
while ((conn = MT_LIST_POP(&idle_conns[i].toremove_conns,
struct connection *, toremove_list))) {
conn_release(conn);
}
if ((i = ((i + 1 == global.nbthread) ? 0 : i + 1)) == tid)
break;
}
/* All idle connections should be removed now. */
BUG_ON(srv->curr_idle_conns);
/* Close idle private connections attached to this server. */
for (i = tid;;) {
MT_LIST_FOR_EACH_ENTRY_LOCKED(sess_conns, &srv->per_thr[i].sess_conns, srv_el, back) {
struct connection *conn, *conn_back;
list_for_each_entry_safe(conn, conn_back, &sess_conns->conn_list, sess_el) {
/* Only idle connections should be present if srv_check_for_deletion() is true. */
BUG_ON(!(conn->flags & CO_FL_SESS_IDLE));
--((struct session *)conn->owner)->idle_conns;
LIST_DEL_INIT(&conn->sess_el);
conn->owner = NULL;
if (sess_conns->tid != tid) {
if (conn->mux && conn->mux->takeover)
conn->mux->takeover(conn, sess_conns->tid, 1);
else if (conn->xprt && conn->xprt->takeover)
conn->xprt->takeover(conn, conn->ctx, sess_conns->tid, 1);
}
conn_release(conn);
}
LIST_DELETE(&sess_conns->sess_el);
pool_free(pool_head_sess_priv_conns, sess_conns);
sess_conns = NULL;
}
if ((i = ((i + 1 == global.nbthread) ? 0 : i + 1)) == tid)
break;
}
/* removing cannot fail anymore when we reach this: /* removing cannot fail anymore when we reach this:
* publishing EVENT_HDL_SUB_SERVER_DEL * publishing EVENT_HDL_SUB_SERVER_DEL
*/ */