BUG/MEDIUM: mux-h2: ignore conn->owner when deciding if a connection is dead

Originally, valid backend connections always used to have conn->owner
pointing to the owner session. In 1.9, commit 93c885 enforced this when
implementing backend H2 support by making sure that no orphaned connection
was left on its own with no remaining stream able to handle it.
Later, idle connections were reworked so that they were no longer
necessarily attached to a stream, but could be directly in the server,
accessed via a hash, so it started to become possible to have conn->owner
left to NULL when picking such a connection. It in fact happens for
http-reuse always, when the second stream picks the connection because
its owner is NULL and it's not changed.

More recently, a case was identified where it could be theoretically
possible to reinsert a dead connection into an idle list, and commit
59c599f3f0 ("BUG/MEDIUM: mux-h2: make sure not to move a dead
connection to idle") addressed that possibility in 3.3 by adding the
h2c_is_dead() test in h2_detach() before deciding to reinsert a
connection into the idle list.

Unfortunately, the combination of changes above results in the following
sequence being possible:
  - a stream requires a connection, connect_server() creates one, sets
    conn->owner to the session, then when the session is being set up,
    the SSL stack calls conn_create_mux() which gets the session from
    conn->owner, passes it to mux->init() (h2_init), which in turn
    creates the backend stream and assigns it this session.

  - when the stream ends, it detaches (h2_detach), and the call to
    h2c_is_dead() returns false because h2c->conn->owner is set. The
    connection is thus added into the server's idle list.

  - a new stream comes, it finds the connection in the server's list,
    which doesn't require to set conn->owner, the stream is added via
    h2_attach() which passes the stream's session, and that one is
    properly set on h2s again, but never on conn->owner.

  - the stream finishes, detaches, and this time the call to h2c_is_dead()
    sees the owner is NULL, thus indicates that the connection seems dead
    so it's not added again to the idle list, and it's destroyed.

Note that this most only happens at low loads (at most one active stream
per connection, so typically at most than one active stream per thread),
where the H2 reuse ratio on a server configured with http-reuse always
or http-reuse aggressive is close to 50%. At high loads, this is much more
rare, though looking at the reuse stats for a server, it's visible that a
sustained load still shows around 1% of the connections being periodically
renewed.

Interestingly, for RHTTP the impact is more important because there
was already a work around for this test in h2c_is_dead() but it uses
conn_is_reverse(), which is never correct in this case (it should be
called conn_to_reverse() because it says the conn must be reversed
and has not yet been), so this extra test doesn't protect against the
NULL check, and connections are closed after each stream is terminated
(if there is no other stream left).

After a long analysis with Amaury and Olivier, it was concluded that:
  - the h2c_is_dead() addition is finally not the best solution and
    could be refined, however in the current state it's a bit tricky.
  - the conn->owner test in h2c_is_dead() is no longer relevant,
    probably since 2.4 when connections were stored using hash_nodes
    in the servers and would no longer depend on a session, so that
    test should be removed.
  - the test conn_is_reverse() on the same line, that was added to
    ignore the former for RHTTP, and which doesn't properly work either
    should be removed as well.

Some further cleanups should be performed to clarify this situation.

This patch implements the points above, and it should be backported
wherever commit 59c599f3f0 was backported.
This commit is contained in:
Willy Tarreau 2026-04-16 17:59:44 +02:00
parent 0af603f46f
commit a0541f5d21

View File

@ -1063,7 +1063,6 @@ h2c_is_dead(const struct h2c *h2c)
((h2c->flags & H2_CF_ERROR) || /* errors close immediately */
(h2c->flags & H2_CF_ERR_PENDING && h2c->st0 < H2_CS_FRAME_H) || /* early error during connect */
(h2c->st0 >= H2_CS_ERROR && !h2c->task) || /* a timeout stroke earlier */
(!(h2c->conn->owner) && !conn_is_reverse(h2c->conn)) || /* Nobody's left to take care of the connection, drop it now */
(!br_data(h2c->mbuf) && /* mux buffer empty, also process clean events below */
((h2c->flags & H2_CF_RCVD_SHUT) ||
h2c_reached_last_stream(h2c)))))