mirror of
https://git.haproxy.org/git/haproxy.git/
synced 2025-08-05 22:56:57 +02:00
DOC: watchdog: update the doc to reflect the recent changes
The watchdog was improved and fixed a few months ago, but the doc had not been updated to reflect this. That's now done.
This commit is contained in:
parent
e399daa67e
commit
f5ed309449
@ -21,7 +21,7 @@ falls back to CLOCK_REALTIME. The former is more accurate as it really counts
|
|||||||
the time spent in the process, while the latter might also account for time
|
the time spent in the process, while the latter might also account for time
|
||||||
stuck on paging in etc.
|
stuck on paging in etc.
|
||||||
|
|
||||||
Then wdt_ping() is called to arm the timer. t's set to trigger every
|
Then wdt_ping() is called to arm the timer. It's set to trigger every
|
||||||
<wdt_warn_blocked_traffic_ns> interval. It is also called by wdt_handler()
|
<wdt_warn_blocked_traffic_ns> interval. It is also called by wdt_handler()
|
||||||
to reprogram a new wakeup after it has ticked.
|
to reprogram a new wakeup after it has ticked.
|
||||||
|
|
||||||
@ -37,15 +37,18 @@ If the thread was not marked as stuck, it's verified that no progress was made
|
|||||||
for at least one second, in which case the TH_FL_STUCK flag is set. The lack of
|
for at least one second, in which case the TH_FL_STUCK flag is set. The lack of
|
||||||
progress is measured by the distance between the thread's current cpu_time and
|
progress is measured by the distance between the thread's current cpu_time and
|
||||||
its prev_cpu_time. If the lack of progress is at least as large as the warning
|
its prev_cpu_time. If the lack of progress is at least as large as the warning
|
||||||
threshold and no context switch happened since last call, ha_stuck_warning() is
|
threshold, then the signal is bounced to the faulty thread if it's not the
|
||||||
called to emit a warning about that thread. In any case the context switch
|
current one. Since this bounce is based on the time spent without update, it
|
||||||
counter for that thread is updated.
|
already doesn't happen often.
|
||||||
|
|
||||||
If the thread was already marked as stuck, then the thread is considered as
|
Once on the faulty thread, two checks are performed:
|
||||||
definitely stuck. Then ha_panic() is directly called if the thread is the
|
1) if the thread was already marked as stuck, then the thread is considered
|
||||||
current one, otherwise ha_kill() is used to resend the signal directly to the
|
as definitely stuck, and ha_panic() is called. It will not return.
|
||||||
target thread, which will in turn go through this handler and handle the panic
|
|
||||||
itself.
|
2) a check is made to verify if the scheduler is still ticking, by reading
|
||||||
|
and setting a variable that only the scheduler can clear when leaving a
|
||||||
|
task. If the scheduler didn't make any progress, ha_stuck_warning() is
|
||||||
|
called to emit a warning about that thread.
|
||||||
|
|
||||||
Most of the time there's no panic of course, and a wdt_ping() is performed
|
Most of the time there's no panic of course, and a wdt_ping() is performed
|
||||||
before leaving the handler to reprogram a check for that thread.
|
before leaving the handler to reprogram a check for that thread.
|
||||||
@ -61,12 +64,12 @@ set TAINTED_WARN_BLOCKED_TRAFFIC.
|
|||||||
|
|
||||||
ha_panic() uses the current thread's trash buffer to produce the messages, as
|
ha_panic() uses the current thread's trash buffer to produce the messages, as
|
||||||
we don't care about its contents since that thread will never return. However
|
we don't care about its contents since that thread will never return. However
|
||||||
ha_stuck_warning() instead uses a local 4kB buffer in the thread's stack.
|
ha_stuck_warning() instead uses a local 8kB buffer in the thread's stack.
|
||||||
ha_panic() will call ha_thread_dump_fill() for each thread, to complete the
|
ha_panic() will call ha_thread_dump_fill() for each thread, to complete the
|
||||||
buffer being filled with each thread's dump messages. ha_stuck_warning() only
|
buffer being filled with each thread's dump messages. ha_stuck_warning() only
|
||||||
calls the function for the current thread. In both cases the message is then
|
calls ha_thread_dump_one(), which works on the current thread. In both cases
|
||||||
directly sent to fd #2 (stderr) and ha_thread_dump_one() is called to release
|
the message is then directly sent to fd #2 (stderr) and ha_thread_dump_done()
|
||||||
the dumped thread.
|
is called to release the dumped thread.
|
||||||
|
|
||||||
Both print a few extra messages, but ha_panic() just ends by looping on abort()
|
Both print a few extra messages, but ha_panic() just ends by looping on abort()
|
||||||
until the process dies.
|
until the process dies.
|
||||||
@ -110,13 +113,19 @@ ha_dump_backtrace() before returning.
|
|||||||
ha_dump_backtrace() produces a backtrace into a local buffer (100 entries max),
|
ha_dump_backtrace() produces a backtrace into a local buffer (100 entries max),
|
||||||
then dumps the code bytes nearby the crashing instrution, dumps pointers and
|
then dumps the code bytes nearby the crashing instrution, dumps pointers and
|
||||||
tries to resolve function names, and sends all of that into the target buffer.
|
tries to resolve function names, and sends all of that into the target buffer.
|
||||||
|
On some architectures (x86_64, arm64), it will also try to detect and decode
|
||||||
|
call instructions and resolve them to called functions.
|
||||||
|
|
||||||
3. Improvements
|
3. Improvements
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
The symbols resolution is extremely expensive, particularly for the warnings
|
The symbols resolution is extremely expensive, particularly for the warnings
|
||||||
which should be fast. But we need it, it's just unfortunate that it strikes at
|
which should be fast. But we need it, it's just unfortunate that it strikes at
|
||||||
the wrong moment.
|
the wrong moment. At least ha_dump_backtrace() does disable signals while it's
|
||||||
|
resolving, in order to avoid unwanted re-entrance. In addition, the called
|
||||||
|
function resolve_sym_name() uses some locking and refrains from calling the
|
||||||
|
dladdr family of functions in a re-entrant way (in the worst case only well
|
||||||
|
known symbols will be resolved)..
|
||||||
|
|
||||||
In an ideal case, ha_dump_backtrace() would dump the pointers to a local array,
|
In an ideal case, ha_dump_backtrace() would dump the pointers to a local array,
|
||||||
which would then later be resolved asynchronously in a tasklet. This can work
|
which would then later be resolved asynchronously in a tasklet. This can work
|
||||||
|
Loading…
Reference in New Issue
Block a user