From 4173f4ea2986d344850f4c19160fd731bef5833e Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Fri, 29 Apr 2022 15:04:41 +0200 Subject: [PATCH] BUG/MINOR: conn_stream: do not confirm a connection from the frontend path In issue #1468 it was reported that sometimes server-side connection attempts were only validated after the "timeout connect" value, and that would only happen with an H2 client. A long code analysis with the output dumps showed only one possible call path: an I/O event on the frontend while reading had just been disabled calls h2_wake() which in turns wakes cs_conn_io_cb(), which tries cs_conn_process() and cs_notify(), which sees that the other side is not blocked (already in CS_ST_CON) and tries cs_chk_snd() on it. But on that side the connection had just finished to be set up and not yet woken the stream up, cs_notify() would then call cs_conn_send() which succeeds and passes the connection to CS_ST_RDY. The problem is that nothing new happened on the frontend side so there's no reason to wake the stream up and the backend-side conn_stream remains in CS_ST_RDY state with the stream never being woken up. Once the "timeout connect" strikes, process_stream() is woken up and finds the connection finally setup, so it ignores the timeout and goes on. The number of conditions to meet to reproduce this is huge, which also explains why the reporter says it's "occasional" and we were never able to reproduce it in the lab. It needs at least reads to be disabled and immediately re-enabled on the frontend side (e.g. buffer full) with an I/O even reported before the poller had an opportunity to be disabled but with no subscribe being reinstalled, so that sock_conn_iocb() has no other choice but calling h2_wake(), and exactly at the same time the backend connection must finish to set up so that it was not yet reported by the poller, the data were sent and the polling for writes disabled. Several factors are to be considered here: - h2_wake() should probably not call h2_wake_some_streams() for ret >= 0 (common case), but only if some special event is reported for at least one stream; that part is sensitive though as in the past we managed to lose some rare cases (e.g. restart processing after a pause), and such wakeups are extremely rare so we'd better make that effort once in a while. - letting a lazy forward attempt on the frontend confirm a backend connection establishment is too smart to be reliable. That wasn't in fact the intent and it's inherited from the very old code where muxes didn't exist and where it was guaranteed that an even at this layer would wake everyone up. Here the best thing to do is to refrain from attempting to forward data until the connection is confirmed. This will let the poller report the connect() event to the backend side which will process it as it should and does in all other cases. Thanks to Jimmy Crutchfield for having reported useful traces and tested patches. This will have to be backported to all stable branches after some observation. Before 2.6 the function is stream_int_chk_snd_conn(), and the flag to remove is SI_SB_CON. --- src/conn_stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/conn_stream.c b/src/conn_stream.c index 5f1508dbc..78d30354e 100644 --- a/src/conn_stream.c +++ b/src/conn_stream.c @@ -805,7 +805,7 @@ static void cs_app_chk_snd_conn(struct conn_stream *cs) BUG_ON(!cs_conn(cs)); - if (unlikely(!cs_state_in(cs->state, CS_SB_CON|CS_SB_RDY|CS_SB_EST) || + if (unlikely(!cs_state_in(cs->state, CS_SB_RDY|CS_SB_EST) || (oc->flags & CF_SHUTW))) return;