Fix flaky TestStop_DrainingEnabled and TestStop_DrainingDisabled tests.
The tests used real HTTP servers and real time, making them susceptible to
race conditions and timing-dependent failures.
The solution is to convert both tests to use synctest for deterministic fake time.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
These bugs were discovered accidentally with code analysis:
- https://app.devin.ai/review/prometheus/prometheus/pull/16355
Upon further inspection and performing more analysis, 3 potential bugs were found:
1. sendloops could continue running if corresponding AM changed position in the config
2. multiple configs with the same hash would share sendloops resulting in sets without sendloops
3. sendloops could continue running if the config hash was changed
- `TestApplyConfigSendLoopsNotStoppedOnKeyChange`: Verifies sendLoops work when keys swap (no fix needed)
- `TestApplyConfigDuplicateHashSharesSendLoops`: Verifies sendLoops are independent with duplicate hashes (bug fixed)
- `TestApplyConfigHashChangeLeaksSendLoops`: Verifies sendLoops are cleaned up when hash changes (bug fixed)
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* notifier: unit test for dropping throughput on stuck AM
Ref: https://github.com/prometheus/prometheus/issues/7676
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* chore(notifier): remove year from copyrights
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* feat(notifier): independent alertmanager sendloops
Independent Alertmanager sendloops avoid issues with queue overflowing
when one or more Alertmanager instances are unavailable which could
result in lost alert notifications.
The sendloops are managed per AlertmanagerSet which are dynamically
added/removed with service discovery or configuration reload.
The following metrics now include an extra dimention for alertmanager label:
- prometheus_notifications_dropped_total
- prometheus_notifications_queue_capacity
- prometheus_notifications_queue_length
This change also includes the test from #14099Closes#7676
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>