In peer_treat_updatemsg(), we call stktable_touch_remote() after releasing the write lock on the TS, asking it to decrement the refcnt, then we update ts->seen. Unfortunately this is racy and causes the issue that Christian reported in issue #2959. The sequence of events is very hard to trigger manually, but what happens is the following: T1. stktable_touch_remote(table, ts, 1); -> at this point the entry is in the mt_list, and the refcnt is zero. T2. stktable_trash_oldest() or process_table_expire() -> these can run, because the refcnt is now zero. The entry is cleanly deleted and freed. T1. HA_ATOMIC_STORE(&ts->seen, 1) -> we dereference freed memory. A first attempt at a fix was made by keeping the refcnt held during all the time the entry is in the mt_list, but this is expensive as such entries cannot be purged, causing lots of skips during trash_oldest_data(). This managed to trigger watchdogs, and was only hiding the real cause of the problem. The correct approach clearly is to maintain the ref_cnt until we touch ->seen. That's what this patch does. It does not decrement the refcnt, while calling stktable_touch_remote(), and does it manually after touching ->seen. With this the problem is gone. Note that a reproducer involves the following: - a config with 10 stick-ctr tracking the same table with a random key between 10M and 100M depending on the machine. - the expiration should be between 10 and 20s. http_req_cnt is stored and shared with the peers. - 4 total processes with such a config on the local machine, each corresponding to a different peer. 3 of the peers are bound to half of the cores (all threads) and share the same threads; the last process is bound to the other half with its own threads. - injecting at full load, ~256 conn, on the shared listening port. After ~2x expiration time to 1 minute the lone process should segfault in pools code due to a corrupted by_lru list. This problem already exists in earlier versions but the race looks narrower. Given how difficult it is to trigger on a given machine in its current form, it's likely that it only happens once in a while on stable branches. The fix must be backported wherever the code is similar, and there's no hope to reproduce it to validate the backport. Thanks again to Christian for his amazing help!
HAProxy
HAProxy is a free, very fast and reliable reverse-proxy offering high availability, load balancing, and proxying for TCP and HTTP-based applications.
Installation
The INSTALL file describes how to build HAProxy. A list of packages is also available on the wiki.
Getting help
The discourse and the mailing-list are available for questions or configuration assistance. You can also use the slack or IRC channel. Please don't use the issue tracker for these.
The issue tracker is only for bug reports or feature requests.
Documentation
The HAProxy documentation has been split into a number of different files for ease of use. It is available in text format as well as HTML. The wiki is also meant to replace the old architecture guide.
Please refer to the following files depending on what you're looking for:
- INSTALL for instructions on how to build and install HAProxy
- BRANCHES to understand the project's life cycle and what version to use
- LICENSE for the project's license
- CONTRIBUTING for the process to follow to submit contributions
The more detailed documentation is located into the doc/ directory:
- doc/intro.txt for a quick introduction on HAProxy
- doc/configuration.txt for the configuration's reference manual
- doc/lua.txt for the Lua's reference manual
- doc/SPOE.txt for how to use the SPOE engine
- doc/network-namespaces.txt for how to use network namespaces under Linux
- doc/management.txt for the management guide
- doc/regression-testing.txt for how to use the regression testing suite
- doc/peers.txt for the peers protocol reference
- doc/coding-style.txt for how to adopt HAProxy's coding style
- doc/internals for developer-specific documentation (not all up to date)
License
HAProxy is licensed under GPL 2 or any later version, the headers under LGPL 2.1. See the LICENSE file for a more detailed explanation.