mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-09-19 21:01:47 +02:00
In GH issue #2552, Christian Ruppert reported an increase in crashes with recent 3.0-dev versions, always related with stick-tables and peers. One particularity of his config is that it has a lot of peers. While trying to reproduce, it empirically was found that firing 10 load generators at 10 different haproxy instances tracking a random key among 100k against a table of max 5k entries, on 8 threads and between a total of 50 parallel peers managed to reproduce the crashes in seconds, very often in ebtree deletion or insertion code, but not only. The debugging revealed that the crashes are often caused by a parent node being corrupted while delete/insert tries to update it regarding a recently inserted/removed node, and that that corrupted node had always been proven to be deleted, then immediately freed, so it ought not be visited in the tree from functions enclosed between a pair of lock/unlock. As such the only possibility was that it had experienced unexpected inserts. Also, running with pool integrity checking would 90% of the time cause crashes during allocation based on corrupted contents in the node, likely because it was found at two places in the same tree and still present as a parent of a node being deleted or inserted (hence the __stksess_free and stktable_trash_oldest callers being visible on these items). Indeed the issue is in fact related to the test set (occasionally redundant keys, many peers). What happens is that sometimes, a same key is learned from two different peers. When it is learned for the first time, we end up in stktable_touch_with_exp() in the "else" branch, where the test for existence is made before taking the lock (since commit cfeca3a3a3 ("MEDIUM: stick-table: touch updates under an upgradable read lock") that was merged in 2.9), and from there the entry is added. But is one of the threads manages to insert it before the other thread takes the lock, then the second thread will try to insert this node again. And inserting an already inserted node will corrupt the tree (note that we never switched to enforcing a check in insertion code on this due to API history that would break various code parts). Here the solution is simple, it requires to recheck leaf_p after getting the lock, to avoid touching anything if the entry has already been inserted in the mean time. Many thanks to Christian Ruppert for testing this and for his invaluable help on this hard-to-trigger issue. This fix needs to be backported to 2.9.
The HAProxy documentation has been split into a number of different files for ease of use. Please refer to the following files depending on what you're looking for : - INSTALL for instructions on how to build and install HAProxy - BRANCHES to understand the project's life cycle and what version to use - LICENSE for the project's license - CONTRIBUTING for the process to follow to submit contributions The more detailed documentation is located into the doc/ directory : - doc/intro.txt for a quick introduction on HAProxy - doc/configuration.txt for the configuration's reference manual - doc/lua.txt for the Lua's reference manual - doc/SPOE.txt for how to use the SPOE engine - doc/network-namespaces.txt for how to use network namespaces under Linux - doc/management.txt for the management guide - doc/regression-testing.txt for how to use the regression testing suite - doc/peers.txt for the peers protocol reference - doc/coding-style.txt for how to adopt HAProxy's coding style - doc/internals for developer-specific documentation (not all up to date)
Description
Languages
C
98.1%
Shell
0.8%
Makefile
0.5%
Lua
0.2%
Python
0.2%