12 Commits

Author SHA1 Message Date
Arve Knudsen
020a0b30a0
notifier: fix flaky TestStop_DrainingEnabled and TestStop_DrainingDisabled race conditions (#17938)
Fix flaky TestStop_DrainingEnabled and TestStop_DrainingDisabled tests.
The tests used real HTTP servers and real time, making them susceptible to
race conditions and timing-dependent failures.
The solution is to convert both tests to use synctest for deterministic fake time.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2026-01-29 08:07:32 +01:00
Arve Knudsen
ade3f08eca
notifier: fix flaky TestHangingNotifier race condition (#17934)
* notifier: fix flaky TestHangingNotifier race condition

Make deterministic through `synctest.Test()`.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2026-01-27 17:06:46 +01:00
Siavash Safi
2437977bff
fix(notify): apply config sendloop cleanup fix (#17915)
These bugs were discovered accidentally with code analysis:
- https://app.devin.ai/review/prometheus/prometheus/pull/16355

Upon further inspection and performing more analysis, 3 potential bugs were found:
1. sendloops could continue running if corresponding AM changed position in the config
2. multiple configs with the same hash would share sendloops resulting in sets without sendloops
3. sendloops could continue running if the config hash was changed

- `TestApplyConfigSendLoopsNotStoppedOnKeyChange`: Verifies sendLoops work when keys swap (no fix needed)
- `TestApplyConfigDuplicateHashSharesSendLoops`: Verifies sendLoops are independent with duplicate hashes (bug fixed)
- `TestApplyConfigHashChangeLeaksSendLoops`: Verifies sendLoops are cleaned up when hash changes (bug fixed)

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2026-01-22 22:22:44 +01:00
Siavash Safi
d9ccd70ac1
fix(notify): flaky tests (#17899)
Add a helper function to set up AlertmanagerSets.
Fix all flaky tests.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2026-01-22 11:24:35 +00:00
Siavash Safi
a89c665f47
feat(notifier): independent alertmanager sendloops (#16355)
* notifier: unit test for dropping throughput on stuck AM

Ref: https://github.com/prometheus/prometheus/issues/7676

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>

* chore(notifier): remove year from copyrights

Signed-off-by: Siavash Safi <siavash@cloudflare.com>

* feat(notifier): independent alertmanager sendloops

Independent Alertmanager sendloops avoid issues with queue overflowing
when one or more Alertmanager instances are unavailable which could
result in lost alert notifications.
The sendloops are managed per AlertmanagerSet which are dynamically
added/removed with service discovery or configuration reload.

The following metrics now include an extra dimention for alertmanager label:
- prometheus_notifications_dropped_total
- prometheus_notifications_queue_capacity
- prometheus_notifications_queue_length

This change also includes the test from #14099

Closes #7676

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>

---------

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-01-20 10:33:07 +01:00
Ben Kochie
e14795bbf4
Remove copyright date from headers (#17785)
Remove copyright dates from various files as part of [PROM-50].

[PROM-50]: https://github.com/prometheus/proposals/blob/main/proposals/0050-remove-copyright-dates.md

Signed-off-by: SuperQ <superq@gmail.com>
2026-01-05 13:46:21 +01:00
Ben Kochie
48956f60d7
Update modernize (#17471)
Apply additional Go modernize tool improvements.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-04 05:13:49 +00:00
Arve Knudsen
913cc8f72b
Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2 (#17151)
* Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2
* Upgrade to client_golang@v1.23.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-09-06 13:04:24 +02:00
machine424
bd725fd6b8
test(notifier): add a test showing an alert mutation bug between alertmanager_config (alertmanagersets)
The alert_relabel_configs should only apply to the corresponding alertmanagerset

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-25 17:04:14 +02:00
Arve Knudsen
0a40df33fb
Make metric/label name validation scheme explicit (#16928)
* Parameterize metric/label name validation scheme

Parameterized metric/label name validation scheme

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-18 08:09:00 +00:00
Matthieu MOREL
cef219c31c chore: enable unused-receiver rule from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-08-04 09:43:33 +00:00
Siavash Safi
ef48e4cb9f
chore: refactor notifier package
Split the notifier package into smaller source files.

Signed-off-by: Siavash Safi <siavash@cloudflare.com>
2025-04-03 17:48:04 +11:00