Go to file
Willy Tarreau 3aab17bd56 BUG/MAJOR: connection: reset conn->owner when detaching from session list
Baptiste reported a new crash affecting 2.3 which can be triggered
when using H2 on the backend, with http-reuse always and with a tens
of clients doing close only. There are a few combined cases which cause
this to happen, but each time the issue is the same, an already freed
session is dereferenced in session_unown_conn().

Two cases were identified to cause this:
  - a connection referencing a session as its owner, which is detached
    from the session's list and is destroyed after this session ends.
    The test on conn->owner before calling session_unown_conn() is not
    sufficent as the pointer is not null but is not valid anymore.

  - a connection that never goes idle and that gets killed form the
    mux, where session_free() is called first, then conn_free() calls
    session_unown_conn() which scans the just freed session for older
    connections. This one is only triggered with DEBUG_UAF

The reason for this session to be present here is that it's needed during
the connection setup, to be passed to conn_install_mux_be() to mux->init()
as the owning session, but it's never deleted aftrewards. Furthermore, even
conn_session_free() doesn't delete this pointer after freeing the session
that lies there. Both do definitely result in a use-after-free that's more
easily triggered under DEBUG_UAF.

This patch makes sure that the owner is always deleted after detaching
or killing the session. However it is currently not possible to clear
the owner right after a synchronous init because the proxy protocol
apparently needs it (a reg test checks this), and if we leave it past
the connection setup with the session not attached anywhere, it's hard
to catch the right moment to detach it. This means that the session may
remain in conn->owner as long as the connection has never been added to
nor removed from the session's idle list. Given that this patch needs to
remain simple enough to be backported, instead it adds a workaround in
session_unown_conn() to detect that the element is already not attached
anywhere.

This fix absolutely requires previous patch "CLEANUP: connection: do not
use conn->owner when the session is known" otherwise the situation will
be even worse, as some places used to rely on conn->owner instead of the
session.

The fix could theorically be backported as far as 1.8. However, the code
in this area has significantly changed along versions and there are more
risks of breaking working stuff than fixing real issues there. The issue
was really woken up in two steps during 2.3-dev when slightly reworking
the idle conns with commit 08016ab82 ("MEDIUM: connection: Add private
connections synchronously in session server list") and when adding
support for storing used H2 connections in the session and adding the
necessary call to session_unown_conn() in the muxes. But the same test
managed to crash 2.2 when built in DEBUG_UAF and patched like this,
proving that we used to already leave dangling pointers behind us:

|  diff --git a/include/haproxy/connection.h b/include/haproxy/connection.h
|  index f8f235c1a..dd30b5f80 100644
|  --- a/include/haproxy/connection.h
|  +++ b/include/haproxy/connection.h
|  @@ -458,6 +458,10 @@ static inline void conn_free(struct connection *conn)
|                          sess->idle_conns--;
|                  session_unown_conn(sess, conn);
|          }
|  +       else {
|  +               struct session *sess = conn->owner;
|  +               BUG_ON(sess && sess->origin != &conn->obj_type);
|  +       }
|
|          sockaddr_free(&conn->src);
|          sockaddr_free(&conn->dst);

It's uncertain whether an existing code path there can lead to dereferencing
conn->owner when it's bad, though certain suspicious memory corruption bugs
make one think it's a likely candidate. The patch should not be hard to
adapt there.

Backports to 2.1 and older are left to the appreciation of the person
doing the backport.

A reproducer consists in this:

  global
    nbthread 1

  listen l
    bind :9000
    mode http
    http-reuse always
    server s 127.0.0.1:8999 proto h2

  frontend f
    bind :8999 proto h2
    mode http
    http-request return status 200

Then this will make it crash within 2-3 seconds:

  $ h1load -e -r 1 -c 10 http://0:9000/

If it does not, it might be that DEBUG_UAF was not used (it's harder then)
and it might be useful to restart.
2020-11-21 15:29:22 +01:00
.github CI: Clean up Windows CI 2020-11-21 11:05:16 +01:00
contrib CONTRIB: release-estimator: Add release estimating tool 2020-10-24 12:27:17 +02:00
doc DOC: clarify how to create a fallback crt 2020-11-21 15:29:22 +01:00
examples CLEANUP: assorted typo fixes in the code and comments 2020-06-26 11:27:28 +02:00
include BUG/MAJOR: connection: reset conn->owner when detaching from session list 2020-11-21 15:29:22 +01:00
reg-tests REGTEST: server/cli_set_ssl.vtc requires OpenSSL 2020-11-18 17:41:28 +01:00
scripts CI: travis-ci: replace not defined SSL_LIB, SSL_INC for BotringSSL builds 2020-10-11 21:12:33 +02:00
src BUG/MAJOR: connection: reset conn->owner when detaching from session list 2020-11-21 15:29:22 +01:00
tests MEDIUM: config: remove the deprecated and dangerous global "debug" directive 2020-10-09 19:18:45 +02:00
.cirrus.yml CI: cirrus-ci: exclude slow reg-tests 2020-07-04 06:58:14 +02:00
.gitattributes MINOR: Commit .gitattributes 2020-09-05 16:21:59 +02:00
.gitignore CLEANUP: Update .gitignore 2020-09-12 13:11:24 +02:00
.travis.yml CI: travis-ci: remove builds migrated to GH actions 2020-11-21 05:40:27 +01:00
BRANCHES DOC: assorted typo fixes in the documentation 2020-03-09 14:45:58 +01:00
CHANGELOG [RELEASE] Released version 2.4-dev0 2020-11-05 17:20:35 +01:00
CONTRIBUTING DOC: Use gender neutral language 2020-07-26 22:35:43 +02:00
INSTALL DOC: mention in INSTALL that it's development again 2020-11-05 17:19:13 +01:00
LICENSE LICENSE: add licence exception for OpenSSL 2012-09-07 13:52:26 +02:00
MAINTAINERS REORG: include: split hathreads into haproxy/thread.h and haproxy/thread-t.h 2020-06-11 10:18:56 +02:00
Makefile BUILD: makefile: enable crypt(3) for OpenBSD 2020-11-21 05:45:05 +01:00
README DOC: create a BRANCHES file to explain the life cycle 2019-06-15 22:00:14 +02:00
ROADMAP DOC: update the outdated ROADMAP file 2019-06-15 21:59:54 +02:00
SUBVERS BUILD: use format tags in VERDATE and SUBVERS files 2013-12-10 11:22:49 +01:00
VERDATE [RELEASE] Released version 2.3.0 2020-11-05 17:04:53 +01:00
VERSION [RELEASE] Released version 2.4-dev0 2020-11-05 17:20:35 +01:00

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)