mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-08-10 00:57:02 +02:00
BUG/MINOR: conn_stream: do not confirm a connection from the frontend path
In issue #1468 it was reported that sometimes server-side connection attempts were only validated after the "timeout connect" value, and that would only happen with an H2 client. A long code analysis with the output dumps showed only one possible call path: an I/O event on the frontend while reading had just been disabled calls h2_wake() which in turns wakes cs_conn_io_cb(), which tries cs_conn_process() and cs_notify(), which sees that the other side is not blocked (already in CS_ST_CON) and tries cs_chk_snd() on it. But on that side the connection had just finished to be set up and not yet woken the stream up, cs_notify() would then call cs_conn_send() which succeeds and passes the connection to CS_ST_RDY. The problem is that nothing new happened on the frontend side so there's no reason to wake the stream up and the backend-side conn_stream remains in CS_ST_RDY state with the stream never being woken up. Once the "timeout connect" strikes, process_stream() is woken up and finds the connection finally setup, so it ignores the timeout and goes on. The number of conditions to meet to reproduce this is huge, which also explains why the reporter says it's "occasional" and we were never able to reproduce it in the lab. It needs at least reads to be disabled and immediately re-enabled on the frontend side (e.g. buffer full) with an I/O even reported before the poller had an opportunity to be disabled but with no subscribe being reinstalled, so that sock_conn_iocb() has no other choice but calling h2_wake(), and exactly at the same time the backend connection must finish to set up so that it was not yet reported by the poller, the data were sent and the polling for writes disabled. Several factors are to be considered here: - h2_wake() should probably not call h2_wake_some_streams() for ret >= 0 (common case), but only if some special event is reported for at least one stream; that part is sensitive though as in the past we managed to lose some rare cases (e.g. restart processing after a pause), and such wakeups are extremely rare so we'd better make that effort once in a while. - letting a lazy forward attempt on the frontend confirm a backend connection establishment is too smart to be reliable. That wasn't in fact the intent and it's inherited from the very old code where muxes didn't exist and where it was guaranteed that an even at this layer would wake everyone up. Here the best thing to do is to refrain from attempting to forward data until the connection is confirmed. This will let the poller report the connect() event to the backend side which will process it as it should and does in all other cases. Thanks to Jimmy Crutchfield for having reported useful traces and tested patches. This will have to be backported to all stable branches after some observation. Before 2.6 the function is stream_int_chk_snd_conn(), and the flag to remove is SI_SB_CON.
This commit is contained in:
parent
0055d5693e
commit
4173f4ea29
@ -805,7 +805,7 @@ static void cs_app_chk_snd_conn(struct conn_stream *cs)
|
||||
|
||||
BUG_ON(!cs_conn(cs));
|
||||
|
||||
if (unlikely(!cs_state_in(cs->state, CS_SB_CON|CS_SB_RDY|CS_SB_EST) ||
|
||||
if (unlikely(!cs_state_in(cs->state, CS_SB_RDY|CS_SB_EST) ||
|
||||
(oc->flags & CF_SHUTW)))
|
||||
return;
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user