653 Commits

Author SHA1 Message Date
Bartlomiej Plotka
18626a99c4
fix(rules.Manager): ensure non-nil context (#17103)
Saw some panic on main due to lack of defaulting:
https://github.com/prometheus/prometheus/actions/runs/17317373582/job/49162760911

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-08-29 10:43:59 +01:00
beorn7
71c21fb9e4 Fix minor issues after applying analyzer "modernize"
- The tool left an empty line behind that we don't need anymore, see
  https://github.com/prometheus/prometheus/pull/17092. (Arguably not a
  bug in the tool but just our stricter style about empty lines.)

- In tsdb/index/postings_test.go , our (admittedly somewhat
  convoluted) code structure tricked the tool so it spit out something
  that wouldn't even compile.

- storage/remote/queue_manager_test.go is just a minor formatting
  nit.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 15:44:11 +02:00
beorn7
747c5ee2b1 Apply analyzer "modernize" to the whole codebase
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.

This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.

Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 14:48:41 +02:00
pipiland2612
0246aa22f4 Parralel test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-27 10:47:39 +02:00
Marco Pracucci
954cad35b2
Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE (#17064)
* Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Further optimised the case of ALERTS and ALERTS_FOR_STATE without alertname label matcher

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2025-08-21 16:57:57 +02:00
Julius Hinze
77b5c3f217
Histograms: set annotation when adding or subtracting histograms that have not_reset and reset hints.
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-20 15:00:45 +02:00
Arve Knudsen
0a40df33fb
Make metric/label name validation scheme explicit (#16928)
* Parameterize metric/label name validation scheme

Parameterized metric/label name validation scheme

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-18 08:09:00 +00:00
Matthieu MOREL
cef219c31c chore: enable unused-receiver rule from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-08-04 09:43:33 +00:00
Bryan Boreham
95d47c0512 [TESTS] Rules: remove brittle TestNewGroup
It broke on different implementation of 'NewNopLogger'.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-06-23 15:34:03 +01:00
Anand Rajagopal
41d08003c5 refactor: Move 'for' state restoration metrics to defer block
Signed-off-by: Anand Rajagopal <anrajag@amazon.com>
2025-05-28 16:04:49 +00:00
tongjicoder
4fe20fa340 chore: fix some comments
Signed-off-by: tongjicoder <tongjicoder@icloud.com>
2025-05-27 23:14:41 +08:00
Subhramit Basu
44e27a876e
Add parse alerting for rules files (#16601)
Builds over https://github.com/prometheus/prometheus/pull/16462
Addresses comments, adds invalid rules file

Signed-off-by: subhramit <subhramit.bb@live.in>
Co-authored-by: marcodebba <marcodebonis74@gmail.com>
2025-05-26 18:29:34 +02:00
hardlydearly
ba4b058b7a refactor: use slices.Contains to simplify code
Signed-off-by: hardlydearly <799511800@qq.com>
2025-05-09 08:27:10 +02:00
Arve Knudsen
e7e3ab2824
Fix linting issues found by golangci-lint v2.0.2 (#16368)
* Fix linting issues found by golangci-lint v2.0.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-05-03 19:05:13 +02:00
George Krajcsovits
d085f7da9b
Revert "ruler: correct logging of alert name & template data" 2025-03-25 15:36:34 +01:00
George Krajcsovits
e19ff7a000
Merge pull request #15093 from dimitarvdimitrov/dimitar/ruler/unsupported-logger
ruler: correct logging of alert name & template data
2025-03-25 13:47:22 +01:00
Matthieu MOREL
5fa1146e21
chore: enable gci linter (#16245)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-03-22 15:46:13 +00:00
Matthieu MOREL
c7d4b53ec1 chore: enable unused-parameter from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-19 19:50:28 +01:00
Julien Duchesne
77a5698190
Ruler Concurrency: Consider __name__ matchers (#16039)
When calculating dependencies between rules, we sometimes run into `{__name__...}` matchers
These can be used the same way as the actual rule names
This will enable even more rules to run concurrently

The new logic is also not slower:
```
julienduchesne@triceratops prometheus % benchstat test-old.txt test.txt
goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/rules
cpu: Apple M3 Pro
                 │ test-old.txt │              test.txt               │
                 │    sec/op    │   sec/op     vs base                │
DependencyMap-11    1.206µ ± 7%   1.024µ ± 7%  -15.10% (p=0.000 n=10)

                 │ test-old.txt │               test.txt               │
                 │     B/op     │     B/op      vs base                │
DependencyMap-11   1.720Ki ± 0%   1.438Ki ± 0%  -16.35% (p=0.000 n=10)

                 │ test-old.txt │              test.txt              │
                 │  allocs/op   │ allocs/op   vs base                │
DependencyMap-11     39.00 ± 0%   34.00 ± 0%  -12.82% (p=0.000 n=10)
```

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2025-02-17 11:19:16 +00:00
frazou
9b4c8f6be2
rulefmt: support YAML aliases for Alert/Record/Expr (#14957)
* rulefmt: add tests with YAML aliases for Alert/Record/Expr

Altough somewhat discouraged in favour of using proper configuration
management tools to generate full YAML, it can still be useful in some
situations to use YAML anchors/aliases in rules.

The current implementation is however confusing: aliases will work
everywhere except on the alert/record name and expr

This first commit adds (failing) tests to illustrate the issue, the next
one fixes it. The YAML test file is intentionally filled with anchors
and aliases. Although this is probably not representative of a real-world
use case (which would have less of them), it errs on the safer side.

Signed-off-by: François HORTA <fhorta@scaleway.com>

* rulefmt: support YAML aliases for Alert/Record/Expr

This fixes the use of YAML aliases in alert/recording rule names and
expressions. A side effect of this change is that the RuleNode YAML type is
no longer propagated deeper in the codebase, instead the generic Rule type
can now be used.

Signed-off-by: François HORTA <fhorta@scaleway.com>

* rulefmt: Add test for YAML merge combined with aliases

Currently this does work, but adding a test for the related
functionally here makes sense.

Signed-off-by: David Leadbeater <dgl@dgl.cx>

* rulefmt: Rebase to latest changes

Signed-off-by: David Leadbeater <dgl@dgl.cx>

---------

Signed-off-by: François HORTA <fhorta@scaleway.com>
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Co-authored-by: David Leadbeater <dgl@dgl.cx>
2025-02-13 20:48:33 +11:00
Dimitar Dimitrov
237139c992
Merge branch 'main' into dimitar/ruler/unsupported-logger
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2025-02-12 10:55:53 +01:00
Dimitar Dimitrov
2a8ae586f4
ruler: stop all rule groups asynchronously on shutdown (#15804)
* ruler: stop all rule groups asynchronously on shutdown

During shutdown of the rules manager some rule groups have already stopped and are missing evaluations while we're waiting for other groups to finish their evaluation.

When there are many groups (in the thousands), the whole shutdown process can take up to 10 minutes, during which we get miss evaluations.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Use wrappers in stop(); rename awaitStopped()

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Add comment

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

---------

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2025-01-20 21:26:58 +01:00
Bartlomiej Plotka
39d7242cfe
Merge pull request #15669 from rajagopalanand/restore-for-state
rules: Add an option to restore new rule groups added to existing rule manager
2025-01-17 15:37:07 +01:00
Giedrius Statkevičius
92218ecb9b promtool: add --ignore-unknown-fields
Add --ignore-unknown-fields that ignores unknown fields in rule group
files. There are lots of tools in the ecosystem that "like" to extend
the rule group file structure but they are currently unreadable by
promtool if there's anything extra. The purpose of this flag is so that
we could use the "vanilla" promtool instead of rolling our own.

Some examples of tools/code:

https://github.com/grafana/mimir/blob/main/pkg/mimirtool/rules/rwrulefmt/rulefmt.go
8898eb3cc5/pkg/rules/rules.go (L18-L25)

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2025-01-15 11:34:28 +02:00
Julien Duchesne
0a19f1268e
Rule Concurrency: Simpler loop for sequential (default) executions (#15801) 2025-01-09 23:14:21 +00:00
Julien Duchesne
a768a3b95e
Rule Concurrency: Test safe abort of rule evaluations (#15797)
This test was added in the Grafana fork a while ago: https://github.com/grafana/mimir-prometheus/pull/714 and has been helpful to make sure we can safely terminate rule evaluations early
The new rule evaluation logic (done here: https://github.com/prometheus/prometheus/pull/15681) does not have the bug, but the test was useful to verify that

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2025-01-08 16:32:48 +00:00
Julien Duchesne
1a27ab29b8
Rules: Store dependencies instead of boolean (#15689)
* Rules: Store dependencies instead of boolean
To improve https://github.com/prometheus/prometheus/pull/15681 further, we'll need to store the dependencies and dependents of each

Right now, if a rule has both (at least 1) dependents and dependencies, it is not possible to determine the order to run the rules and they must all run sequentially

This PR only changes the dependents and dependencies attributes of rules, it does not implement a new topological sort algorithm

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Store a slice of Rule instead

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Add `BenchmarkRuleDependencyController_AnalyseRules` for future reference

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

---------

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2025-01-06 20:48:38 +00:00
Julien Duchesne
8067f27971
RuleConcurrencyController: Add SplitGroupIntoBatches method (#15681)
* `RuleConcurrencyController`: Add `SplitGroupIntoBatches` method
The concurrency implementation can now return a slice of concurrent rule batches
This allows for additional concurrency as opposed to the current interface which is limited by the order in which the rules have been loaded

Also, I removed the `concurrencyController` attribute from the group. That information is duplicated in the opts.RuleConcurrencyController` attribute, leading to some confusing behavior, especially in tests.

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Address PR comments

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Apply suggestions from code review

Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>

---------

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2025-01-06 18:51:19 +00:00
Dimitar Dimitrov
8b45384a44
Make templateData a struct
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2024-12-24 11:10:13 +02:00
Julien Duchesne
615195372d
Ruler: Move inner eval function (#15688)
Refactoring in prevision of https://github.com/prometheus/prometheus/pull/15681

By moving the `eval` function outside of the for loop, we can modify the rule execution order more easily (without huge git changes)

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2024-12-17 21:11:55 +00:00
Julien Duchesne
e2f037e554
rules: Add new RuleEvaluationTimeSum field to groups (#15672)
* feat(ruler): Add new `RuleEvaluationTimeSum` field to groups
Coupled with a metric: `rule_group_last_rule_duration_sum_seconds`

This will give us more observability into how fast a group runs with or without concurrency

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Update rules/group.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Apply suggestions from code review

Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Remove `in seconds`. A duration is a duration

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

---------

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2024-12-13 21:42:46 +00:00
Julien Duchesne
7802ca263d
RuleDependencyController: Fix for indeterminate conditions (#15560)
The dependency map being empty meant that all the rules were being set as having no dependencies or dependents. Which is the opposite of what we want
Added two new tests that verify the behavior, they failed before the fix, running all the rules concurrently

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2024-12-13 16:48:29 +00:00
Anand Rajagopal
ceb2f653ba Add an option to restore new rule groups added to existing rule manager
Signed-off-by: Anand Rajagopal <anrajag@amazon.com>
2024-12-13 03:15:53 +00:00
Arve Knudsen
c2e28f21ba
rules.NewGroup: Fix when no logger is passed (#15356)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-11-21 16:53:06 +01:00
Matthieu MOREL
af1a19fc78 enable errorf rule from perfsprint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-11-06 16:50:36 +01:00
Vanshika
cccbe72514
TSDB: Fix some edge cases when OOO is enabled (#14710)
Fix some edge cases when OOO is enabled

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
2024-10-23 17:34:28 +02:00
Bryan Boreham
70e2d23027
Merge pull request #11474 from clwluvw/group-label
[FEATURE] rules: add labels at group level
2024-10-21 14:47:12 +01:00
Dimitar Dimitrov
5b2cfc5e45
Merge branch 'main' into dimitar/ruler/unsupported-logger
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2024-10-09 16:53:21 +02:00
TJ Hoplock
6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
Dimitar Dimitrov
b544127d9a
ruler: correct logging of alert name
The error messages we see in the embedded ruler in Mimir right now are contain `alert="unsupported value type"` and `data="unsupported value type"`

```
ts=2024-10-04T09:59:56.900015691Z caller=alerting.go:384 level=warn component=ruler insight=true user=XXX alert="unsupported value type" msg="Expanding alert template failed" err="error executing template __alert_service_mesh_down_alert: template: __alert_service_mesh_down_alert:1:139: executing \"__alert_service_mesh_down_alert\" at <.registry_name>: can't evaluate field registry_name in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value interface {} }" data="unsupported value type"
```

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2024-10-04 12:19:12 +02:00
Nathan Baulch
50cd453c8f
chore: Fix typos (#14868)
* Fix typos

---------

Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>
2024-09-10 22:32:03 +02:00
Arve Knudsen
99204f23ee Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-29 09:52:54 +02:00
riskrole
406bf775aa chore: fix some comments
Signed-off-by: riskrole <yuhang@before.tech>
2024-08-28 11:26:57 +08:00
Arve Knudsen
c9a460d570 Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-26 12:17:10 +02:00
Max Amin
84b819a69f
feat: add Google cloud roundtripper for remote write (#14346)
* feat: Google Auth for remote write

Signed-off-by: Max Amin <maxamin@google.com>

---------

Signed-off-by: Max Amin <maxamin@google.com>
2024-07-30 16:25:19 +01:00
Seena Fallah
f253d36361 rule: allow merging labels from group level
Support merging labels from groups to rule labels

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-07-26 20:18:05 +02:00
Arve Knudsen
7c873004c7 Merge remote-tracking branch 'prometheus/main' into arve/close-engine 2024-07-26 11:48:33 +02:00
gotjosh
465891cc56
Rules: Refactor concurrency controller interface (#14491)
* Rules: Refactor concurrency controller interface

Even though the main purpose of this refactor is to modify the interface of the concurrency controller to accept a Context. I did two drive-by modifications that I think are sensible:

1. I have moved the check for dependencies on rules to the controller itself - this aligns with how the controller should behave as it is a deciding factor on wether we should run concurrently or not.
2. I cleaned up some unused methods from the days of the old interface before #13527 changed it.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-07-22 14:11:18 +01:00
Arve Knudsen
fbc9eddfaf Refactor engine creation in tests
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-07-14 13:58:51 +02:00
Arve Knudsen
fec6adadcd Merge remote-tracking branch 'prometheus/main' into arve/close-engine 2024-07-14 13:19:11 +02:00