mirror of
https://github.com/prometheus/prometheus.git
synced 2026-02-13 20:01:50 +01:00
Try fixing potential deadlocks in discovery
Manager.ApplyConfig() uses multiple locks: - Provider.mu - Manager.targetsMtx Manager.cleaner() uses the same locks but in the opposite order: - First it locks Manager.targetsMtx - The it locks Provider.mu I've seen a few strange cases of Prometheus hanging up on shutdown and never compliting that shutdown. From a few traces I was given it appears that while Prometheus is still running only discovery.Manager and notifier.Manager are running running. From that trace it also seems like they are stuck on a lock from two functions: - cleaner waits on a RLock() - ApplyConfig waits on a Lock() I cannot reproduce it but I suspect this is a race between locks. Imagine this scenario: - Manager.ApplyConfig() is called - Manager.ApplyConfig locks Provider.mu.Lock() - at the same time cleaner() is called on the same Provider instance and it calls Manager.targetsMtx.Lock() - Manager.ApplyConfig() now calls Manager.targetsMtx.Lock() but that lock is already held by cleaner() function so ApplyConfig() hangs there - at the same time cleaner() now wants to lock Provider.mu.Rlock() but that lock is already held by Manager.ApplyConfig() - we end up with both functions locking each other out without any way to break that lock Re-order lock calls to try to avoid this scenario. I tried writing a test case for it but couldn't hit this issue. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This commit is contained in:
parent
8b0d33e5b2
commit
7d55ee8cc8
@ -306,13 +306,13 @@ func (m *Manager) startProvider(ctx context.Context, p *Provider) {
|
||||
|
||||
// cleaner cleans resources associated with provider.
|
||||
func (m *Manager) cleaner(p *Provider) {
|
||||
m.targetsMtx.Lock()
|
||||
p.mu.RLock()
|
||||
m.targetsMtx.Lock()
|
||||
for s := range p.subs {
|
||||
delete(m.targets, poolKey{s, p.name})
|
||||
}
|
||||
p.mu.RUnlock()
|
||||
m.targetsMtx.Unlock()
|
||||
p.mu.RUnlock()
|
||||
if p.done != nil {
|
||||
p.done()
|
||||
}
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user