Commit Graph

7 Commits

Author SHA1 Message Date
Willy Tarreau
670119955b Revert "OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued"
This reverts commit b7ba1d9011. Actually
this test had already been removed in the past by commit fac0f645d
("BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe"),
but the condition to reproduce the bug mentioned there was not clear.

Now after analysis and a certain dose of code cleanup, things start to
appear more obvious. what happens is that if we check the presence of
the node in the tree without taking the lock, we can see the NULL at
the instant the node is being unlinked by another thread in
pendconn_process_next_strm() as part of __pendconn_unlink_prx() or
__pendconn_unlink_srv(). Till now there is no issue except that the
pendconn is not removed from the queue during this operation and that
the task is scheduled to be woken up by pendconn_process_next_strm()
with the stream being added to the list of the server's active
connections by __stream_add_srv_conn(). The first thread finishes
faster and gets back to stream_free() faster than the second one
sets the srv_conn on the stream, so stream_free() skips the s->srv_conn
test and doesn't try to dequeue the freshly queued entry. At the
very least a barrier would be needed there but we can't afford to
free the stream while it's being queued. So there's no other solution
than making sure that either __pendconn_unlink_prx() or
pendconn_cond_unlink() get the entry but never both, which is why the
lock is required around the test. A possible solution would be to set
p->target before unlinking the entry and using it to complete the test.
This would leave no dead period where the pendconn is not seen as
attached.

It is possible, yet extremely difficult, to reproduce this bug, which
was first noticed in bug #880. Running 100 servers with maxconn 1 and
maxqueue 1 on leastconn and a connect timeout of 30ms under 16 threads
with DEBUG_UAF, with a traffic making the backend's queue oscillate
around zero (typically using 250 connections with a local httpterm
server) may rarely manage to trigger a use-after-free.

No backport is needed.
2020-10-23 09:21:55 +02:00
Willy Tarreau
b7ba1d9011 OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued
On connection error processing, we can see massive storms of calls to
pendconn_cond_unlink() to release a possible place in the queue. For
example, in issue #908, on average half of the threads are caught in
this function via back_try_conn_req() consecutive to a synchronous
error. However we wait until grabbing the lock to know if the pendconn
is effectively in a queue, which is expensive for many cases. We know
the transition may only happen from in-queue to out-of-queue so it's safe
to first run a preliminary check to see if it's worth going further. This
will allow to avoid the cost of locking for most requests. This should
not change anything for those completing correctly as they're already
run through pendconn_free() which doesn't call pendconn_cond_unlink()
unless deemed necessary.
2020-10-22 17:32:28 +02:00
Willy Tarreau
fac0f645df BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe
A crash reported in github issue #880 looks impossible unless
pendconn_cond_unlink() occasionally sees a null leaf_p when attempting
to remove an entry, which seems to be confirmed by the reporter. What
seems to be happening is that depending on compiler optimizations,
this pointer can appear as null while pointers are moved if one of
the node's parents is removed from or inserted into the tree. There's
no explicit null of the pointer during these operations but those
pointers are rewritten in multiple steps and nothing prevents this
situation from happening, and there are no particular barrier nor
atomic ops around this.

This test was used to avoid unnecessary locking, for already deleted
entries, but looking at the code it appears that pendconn_free() already
resets s->pend_pos that's used as <p> there, and that the other call
reasons are after an error where the connection will be dropped as
well. So we don't save anything by doing this test, and make it
unsafe. The older code used to check for list emptiness there and
not inside pendconn_unlink(), which explains why the code has stayed
there. Let's just remove this now.

Thanks to @jaroslawr for reporting this issue in great details and for
testing the proposed fix.

This should be backpored to 1.8, where the test on LIST_ISEMPTY should
be moved to pendconn_unlink() instead (inside the lock, just like 2.0+).
2020-10-02 18:10:26 +02:00
Willy Tarreau
b2551057af CLEANUP: include: tree-wide alphabetical sort of include files
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
2020-06-11 10:18:59 +02:00
Willy Tarreau
dfd3de8826 REORG: include: move stream.h to haproxy/stream{,-t}.h
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.

In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.

It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
2020-06-11 10:18:58 +02:00
Willy Tarreau
1e56f92693 REORG: include: move server.h to haproxy/server{,-t}.h
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
2020-06-11 10:18:58 +02:00
Willy Tarreau
a55c45470f REORG: include: move queue.h to haproxy/queue{,-t}.h
Nothing outstanding here. A number of call places were not justified and
removed.
2020-06-11 10:18:58 +02:00