Commit Graph

16 Commits

Author SHA1 Message Date
Ilya Shipitsin
4a689dad03 CLEANUP: assorted typo fixes in the code and comments
This is 32nd iteration of typo fixes
2022-10-30 17:17:56 +01:00
Willy Tarreau
cdc83e0192 MINOR: queue: add a pointer to the server and the proxy in the queue
A queue is specific to a server or a proxy, so we don't need to place
this distinction inside all pendconns, it can be in the queue itself.
This commit adds the relevant fields "px" and "sv" into the struct
queue, and initializes them accordingly.
2021-06-24 10:52:31 +02:00
Willy Tarreau
df3b0cbe31 MINOR: queue: add queue_init() to initialize a queue
This is better and cleaner than open-coding this in the server and
proxy code, where it has all chances of becoming wrong once forgotten.
2021-06-24 10:52:31 +02:00
Willy Tarreau
9ab78293bf MEDIUM: queue: simplify again the process_srv_queue() API (v2)
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
2021-06-24 10:52:31 +02:00
Willy Tarreau
ccd85a3e08 Revert "MEDIUM: queue: simplify again the process_srv_queue() API"
This reverts commit c83e45e9b0.

The recent changes since 5304669e1 MEDIUM: queue: make
pendconn_process_next_strm() only return the pendconn opened a tiny race
condition between stream_free() and process_srv_queue(), as the pendconn
is accessed outside of the lock, possibly while it's being freed. A
different approach is required.
2021-06-24 07:22:18 +02:00
Willy Tarreau
c83e45e9b0 MEDIUM: queue: simplify again the process_srv_queue() API
This basically undoes the API changes that were performed by commit
0274286dd ("BUG/MAJOR: server: fix deadlock when changing maxconn via
agent-check") to address the deadlock issue: since process_srv_queue()
doesn't use the server lock anymore, it doesn't need the "server_locked"
argument, so let's get rid of it before it gets used again.
2021-06-22 18:57:15 +02:00
Willy Tarreau
a05704582c MINOR: server: replace the pendconns-related stuff with a struct queue
Just like for proxies, all three elements (pendconns, nbpend, queue_idx)
were moved to struct queue.
2021-06-22 18:43:14 +02:00
Willy Tarreau
7f3c1df248 MINOR: proxy: replace the pendconns-related stuff with a struct queue
All three elements (pendconns, nbpend, queue_idx) were moved to struct
queue.
2021-06-22 18:43:14 +02:00
Amaury Denoyelle
0274286dd3 BUG/MAJOR: server: fix deadlock when changing maxconn via agent-check
The server_parse_maxconn_change_request locks the server lock. However,
this function can be called via agent-checks or lua code which already
lock it. This bug has been introduced by the following commit :

  commit 79a88ba3d0
  BUG/MAJOR: server: prevent deadlock when using 'set maxconn server'

This commit tried to fix another deadlock with can occur because
previoulsy server_parse_maxconn_change_request requires the server lock
to be held. However, it may call internally process_srv_queue which also
locks the server lock. The locking policy has thus been updated. The fix
is functional for the CLI 'set maxconn' but fails to address the
agent-check / lua counterparts.

This new issue is fixed in two steps :
- changes from the above commit have been reverted. This means that
  server_parse_maxconn_change_request must again be called with the
  server lock.

- to counter the deadlock fixed by the above commit, process_srv_queue
  now takes an argument to render the server locking optional if the
  caller already held it. This is only used by
  server_parse_maxconn_change_request.

The above commit was subject to backport up to 1.8. Thus this commit
must be backported in every release where it is already present.
2021-06-22 11:39:20 +02:00
Willy Tarreau
670119955b Revert "OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued"
This reverts commit b7ba1d9011. Actually
this test had already been removed in the past by commit fac0f645d
("BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe"),
but the condition to reproduce the bug mentioned there was not clear.

Now after analysis and a certain dose of code cleanup, things start to
appear more obvious. what happens is that if we check the presence of
the node in the tree without taking the lock, we can see the NULL at
the instant the node is being unlinked by another thread in
pendconn_process_next_strm() as part of __pendconn_unlink_prx() or
__pendconn_unlink_srv(). Till now there is no issue except that the
pendconn is not removed from the queue during this operation and that
the task is scheduled to be woken up by pendconn_process_next_strm()
with the stream being added to the list of the server's active
connections by __stream_add_srv_conn(). The first thread finishes
faster and gets back to stream_free() faster than the second one
sets the srv_conn on the stream, so stream_free() skips the s->srv_conn
test and doesn't try to dequeue the freshly queued entry. At the
very least a barrier would be needed there but we can't afford to
free the stream while it's being queued. So there's no other solution
than making sure that either __pendconn_unlink_prx() or
pendconn_cond_unlink() get the entry but never both, which is why the
lock is required around the test. A possible solution would be to set
p->target before unlinking the entry and using it to complete the test.
This would leave no dead period where the pendconn is not seen as
attached.

It is possible, yet extremely difficult, to reproduce this bug, which
was first noticed in bug #880. Running 100 servers with maxconn 1 and
maxqueue 1 on leastconn and a connect timeout of 30ms under 16 threads
with DEBUG_UAF, with a traffic making the backend's queue oscillate
around zero (typically using 250 connections with a local httpterm
server) may rarely manage to trigger a use-after-free.

No backport is needed.
2020-10-23 09:21:55 +02:00
Willy Tarreau
b7ba1d9011 OPTIM: queue: don't call pendconn_unlink() when the pendconn is not queued
On connection error processing, we can see massive storms of calls to
pendconn_cond_unlink() to release a possible place in the queue. For
example, in issue #908, on average half of the threads are caught in
this function via back_try_conn_req() consecutive to a synchronous
error. However we wait until grabbing the lock to know if the pendconn
is effectively in a queue, which is expensive for many cases. We know
the transition may only happen from in-queue to out-of-queue so it's safe
to first run a preliminary check to see if it's worth going further. This
will allow to avoid the cost of locking for most requests. This should
not change anything for those completing correctly as they're already
run through pendconn_free() which doesn't call pendconn_cond_unlink()
unless deemed necessary.
2020-10-22 17:32:28 +02:00
Willy Tarreau
fac0f645df BUG/MEDIUM: queue: make pendconn_cond_unlink() really thread-safe
A crash reported in github issue #880 looks impossible unless
pendconn_cond_unlink() occasionally sees a null leaf_p when attempting
to remove an entry, which seems to be confirmed by the reporter. What
seems to be happening is that depending on compiler optimizations,
this pointer can appear as null while pointers are moved if one of
the node's parents is removed from or inserted into the tree. There's
no explicit null of the pointer during these operations but those
pointers are rewritten in multiple steps and nothing prevents this
situation from happening, and there are no particular barrier nor
atomic ops around this.

This test was used to avoid unnecessary locking, for already deleted
entries, but looking at the code it appears that pendconn_free() already
resets s->pend_pos that's used as <p> there, and that the other call
reasons are after an error where the connection will be dropped as
well. So we don't save anything by doing this test, and make it
unsafe. The older code used to check for list emptiness there and
not inside pendconn_unlink(), which explains why the code has stayed
there. Let's just remove this now.

Thanks to @jaroslawr for reporting this issue in great details and for
testing the proposed fix.

This should be backpored to 1.8, where the test on LIST_ISEMPTY should
be moved to pendconn_unlink() instead (inside the lock, just like 2.0+).
2020-10-02 18:10:26 +02:00
Willy Tarreau
b2551057af CLEANUP: include: tree-wide alphabetical sort of include files
This patch fixes all the leftovers from the include cleanup campaign. There
were not that many (~400 entries in ~150 files) but it was definitely worth
doing it as it revealed a few duplicates.
2020-06-11 10:18:59 +02:00
Willy Tarreau
dfd3de8826 REORG: include: move stream.h to haproxy/stream{,-t}.h
This one was not easy because it was embarking many includes with it,
which other files would automatically find. At least global.h, arg.h
and tools.h were identified. 93 total locations were identified, 8
additional includes had to be added.

In the rare files where it was possible to finalize the sorting of
includes by adjusting only one or two extra lines, it was done. But
all files would need to be rechecked and cleaned up now.

It was the last set of files in types/ and proto/ and these directories
must not be reused anymore.
2020-06-11 10:18:58 +02:00
Willy Tarreau
1e56f92693 REORG: include: move server.h to haproxy/server{,-t}.h
extern struct dict server_name_dict was moved from the type file to the
main file. A handful of inlined functions were moved at the bottom of
the file. Call places were updated to use server-t.h when relevant, or
to simply drop the entry when not needed.
2020-06-11 10:18:58 +02:00
Willy Tarreau
a55c45470f REORG: include: move queue.h to haproxy/queue{,-t}.h
Nothing outstanding here. A number of call places were not justified and
removed.
2020-06-11 10:18:58 +02:00