haproxy

mirror of https://git.haproxy.org/git/haproxy.git/ synced 2026-04-21 03:01:02 +02:00

Go to file

Aurelien DARRAGON 7f01f0a8ef BUG/MEDIUM: proxy/sktable: prevent watchdog trigger on soft-stop

During soft-stop, manage_proxy() (p->task) will try to purge
trashable (expired and not referenced) sticktable entries,
effectively releasing the process memory to leave some space
for new processes.

This is done by calling stktable_trash_oldest(), immediately
followed by a pool_gc() to give the memory back to the OS.

As already mentioned in dfe7925 ("BUG/MEDIUM: stick-table:
limit the time spent purging old entries"), calling
stktable_trash_oldest() with a huge batch can result in the function
spending too much time searching and purging entries, and ultimately
triggering the watchdog.

Lately, an internal issue was reported in which we could see
that the watchdog is being triggered in stktable_trash_oldest()
on soft-stop (thus initiated by manage_proxy())

According to the report, the crash seems to only occur since 5938021
("BUG/MEDIUM: stick-table: do not leave entries in end of window during purge")

This could be the result of stktable_trash_oldest() now working
as expected, and thus spending a large amount of time purging
entries when called with a large enough <to_batch>.

Instead of adding new checks in stktable_trash_oldest(), here we
chose to address the issue directly in manage_proxy().

Since the stktable_trash_oldest() function is called with
<to_batch> == <p->table->current>, it's pretty obvious that it could
cause some issues during soft-stop if a large table, assuming it is
full prior to the soft-stop, suddenly sees most of its entries
becoming trashable because of the soft-stop.

Moreover, we should note that the call to stktable_trash_oldest() is
immediately followed by a call to pool_gc():

We know for sure that pool_gc(), as it involves malloc_trim() on
glibc, is rather expensive, and the more memory to reclaim,
the longer the call.

We need to ensure that both stktable_trash_oldest() + consequent
pool_gc() call both theoretically fit in a single task execution window
to avoid contention, and thus prevent the watchdog from being triggered.

To do this, we now allocate a "budget" for each purging attempt.
budget is maxed out to 32K, it means that each sticktable cleanup
attempt will trash at most 32K entries.

32K value is quite arbitrary here, and might need to be adjusted or
even deducted from other parameters if this fails to properly address
the issue without introducing new side-effects.
The goal is to find a good balance between the max duration of each
cleanup batch and the frequency of (expensive) pool_gc() calls.

If most of the budget is actually spent trashing entries, then the task
will immediately be rescheduled to continue the purge.
This way, the purge is effectively batched over multiple task runs.

This may be slowly backported to all stable versions.
[Please note that this commit depends on 6e1fe25 ("MINOR: proxy/pool:
prevent unnecessary calls to pool_gc()")]

2023-03-31 07:05:08 +02:00

.github

CI: Reformat matrix.py using black

2023-01-03 16:28:34 +01:00

addons

MINOR: stconn: Always report READ/WRITE event on shutr/shutw

2023-02-22 15:59:16 +01:00

admin

BUILD: halog: fix missing double-quote at end of help line

2022-11-25 11:11:41 +01:00

dev

MEDIUM: ring: make the offset relative to the head/tail instead of absolute

2023-02-24 09:26:30 +01:00

doc

MINOR: http_fetch: Add case-insensitive argument for url_param/urlp_val

2023-03-30 14:11:25 +02:00

examples

EXAMPLES: remove completely outdated acl-content-sw.cfg

2022-05-30 18:14:24 +02:00

include

MINOR: http_fetch: add case insensitive support for smp_fetch_url_param

2023-03-30 14:11:10 +02:00

reg-tests

REGTESTS : Add test support for case insentitive for url_param

2023-03-30 15:32:14 +02:00

scripts

SCRIPTS: run-regtests: add a version check

2022-11-30 18:44:33 +01:00

src

BUG/MEDIUM: proxy/sktable: prevent watchdog trigger on soft-stop

2023-03-31 07:05:08 +02:00

tests

TESTS: add a unit test for one_among_mask()

2022-06-21 20:29:57 +02:00

.cirrus.yml

CI: cirrus-ci: bump FreeBSD image to 13-1

2022-09-09 13:30:17 +02:00

.gitattributes

MINOR: Configure the cpp userdiff driver for *.[ch] in .gitattributes

2021-02-22 18:17:57 +01:00

.gitignore

CLEANUP: exclude udp-perturb with .gitignore

2022-09-16 15:47:04 +02:00

.mailmap

DOC: update Tim's address in .mailmap

2021-09-16 09:14:14 +02:00

.travis.yml

CI: travis-ci: temporarily disable arm64 builds

2021-08-07 07:28:15 +02:00

BRANCHES

DOC: fix some spelling issues over multiple files

2021-01-08 14:53:47 +01:00

CHANGELOG

[RELEASE] Released version 2.8-dev6

2023-03-28 13:58:56 +02:00

CONTRIBUTING

CLEANUP: assorted typo fixes in the code and comments

2021-08-16 12:37:59 +02:00

INSTALL

MINOR: version: mention that it's development again

2022-12-01 15:24:10 +01:00

LICENSE

LICENSE: add licence exception for OpenSSL

2012-09-07 13:52:26 +02:00

MAINTAINERS

CLEANUP: assorted typo fixes in the code and comments

2022-11-30 14:02:36 +01:00

Makefile

BUILD: da: extends CFLAGS to support API v3 from 3.1.7 and onwards.

2023-03-28 08:40:34 +02:00

README

DOC: create a BRANCHES file to explain the life cycle

2019-06-15 22:00:14 +02:00

SUBVERS

BUILD: use format tags in VERDATE and SUBVERS files

2013-12-10 11:22:49 +01:00

VERDATE

[RELEASE] Released version 2.8-dev6

2023-03-28 13:58:56 +02:00

VERSION

[RELEASE] Released version 2.8-dev6

2023-03-28 13:58:56 +02:00

README

The HAProxy documentation has been split into a number of different files for
ease of use.

Please refer to the following files depending on what you're looking for :

  - INSTALL for instructions on how to build and install HAProxy
  - BRANCHES to understand the project's life cycle and what version to use
  - LICENSE for the project's license
  - CONTRIBUTING for the process to follow to submit contributions

The more detailed documentation is located into the doc/ directory :

  - doc/intro.txt for a quick introduction on HAProxy
  - doc/configuration.txt for the configuration's reference manual
  - doc/lua.txt for the Lua's reference manual
  - doc/SPOE.txt for how to use the SPOE engine
  - doc/network-namespaces.txt for how to use network namespaces under Linux
  - doc/management.txt for the management guide
  - doc/regression-testing.txt for how to use the regression testing suite
  - doc/peers.txt for the peers protocol reference
  - doc/coding-style.txt for how to adopt HAProxy's coding style
  - doc/internals for developer-specific documentation (not all up to date)

Languages

C 98.1%

Shell 0.9%

Makefile 0.5%

Lua 0.2%

Python 0.1%