mirror of
https://github.com/prometheus/prometheus.git
synced 2025-08-16 02:57:11 +02:00
The new docs site will have syntax highlighting, so this adds language tags to code boxes that are currently missing them. I didn't add `promql` as a language yet since the highlighter doesn't support it yet, plus a lot of the PromQL codeboxes in our docs aren't strictly valid PromQL, they are more like multiple expressions listed in the same code box on multiple lines. So I'm leaving that for sometime later. In the HTTP API page, I moved the curl examples from the JSON codeboxes to their own ones above the JSON output. I considered putting an "Output:" text between the curl + JSON output, but I think the way it currently looks without it is probably fine. I also fixed a number of headings which were at the wrong level relative to their nesting in the document. I also removed `go` as a language from the Go template language examples, because the Go template language isn't Go at all. I also adjusted the indent on one codebox to be more reasonable (2 spaces instead of 8). And then finally, my editor made a bunch of whitespace changes automatically, like removing trailing spaces. Signed-off-by: Julius Volz <julius.volz@gmail.com> Signed-off-by: Julius Volz <julius.volz@gmail.com>
120 lines
4.8 KiB
Markdown
120 lines
4.8 KiB
Markdown
---
|
|
title: Alerting rules
|
|
sort_rank: 3
|
|
---
|
|
|
|
# Alerting rules
|
|
|
|
Alerting rules allow you to define alert conditions based on Prometheus
|
|
expression language expressions and to send notifications about firing alerts
|
|
to an external service. Whenever the alert expression results in one or more
|
|
vector elements at a given point in time, the alert counts as active for these
|
|
elements' label sets.
|
|
|
|
## Defining alerting rules
|
|
|
|
Alerting rules are configured in Prometheus in the same way as [recording
|
|
rules](recording_rules.md).
|
|
|
|
An example rules file with an alert would be:
|
|
|
|
```yaml
|
|
groups:
|
|
- name: example
|
|
labels:
|
|
team: myteam
|
|
rules:
|
|
- alert: HighRequestLatency
|
|
expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
|
|
for: 10m
|
|
keep_firing_for: 5m
|
|
labels:
|
|
severity: page
|
|
annotations:
|
|
summary: High request latency
|
|
```
|
|
|
|
The optional `for` clause causes Prometheus to wait for a certain duration
|
|
between first encountering a new expression output vector element and counting
|
|
an alert as firing for this element. In this case, Prometheus will check that
|
|
the alert continues to be active during each evaluation for 10 minutes before
|
|
firing the alert. Elements that are active, but not firing yet, are in the pending state.
|
|
Alerting rules without the `for` clause will become active on the first evaluation.
|
|
|
|
There is also an optional `keep_firing_for` clause that tells Prometheus to keep
|
|
this alert firing for the specified duration after the firing condition was last met.
|
|
This can be used to prevent situations such as flapping alerts, false resolutions
|
|
due to lack of data loss, etc. Alerting rules without the `keep_firing_for` clause
|
|
will deactivate on the first evaluation where the condition is not met (assuming
|
|
any optional `for` duration described above has been satisfied).
|
|
|
|
The `labels` clause allows specifying a set of additional labels to be attached
|
|
to the alert. Any existing conflicting labels will be overwritten. The label
|
|
values can be templated.
|
|
|
|
The `annotations` clause specifies a set of informational labels that can be used to store longer additional information such as alert descriptions or runbook links. The annotation values can be templated.
|
|
|
|
### Templating
|
|
|
|
Label and annotation values can be templated using [console
|
|
templates](https://prometheus.io/docs/visualization/consoles). The `$labels`
|
|
variable holds the label key/value pairs of an alert instance. The configured
|
|
external labels can be accessed via the `$externalLabels` variable. The
|
|
`$value` variable holds the evaluated value of an alert instance.
|
|
|
|
# To insert a firing element's label values:
|
|
{{ $labels.<labelname> }}
|
|
# To insert the numeric expression value of the firing element:
|
|
{{ $value }}
|
|
|
|
Examples:
|
|
|
|
```yaml
|
|
groups:
|
|
- name: example
|
|
rules:
|
|
|
|
# Alert for any instance that is unreachable for >5 minutes.
|
|
- alert: InstanceDown
|
|
expr: up == 0
|
|
for: 5m
|
|
labels:
|
|
severity: page
|
|
annotations:
|
|
summary: "Instance {{ $labels.instance }} down"
|
|
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
|
|
|
|
# Alert for any instance that has a median request latency >1s.
|
|
- alert: APIHighRequestLatency
|
|
expr: api_http_request_latencies_second{quantile="0.5"} > 1
|
|
for: 10m
|
|
annotations:
|
|
summary: "High request latency on {{ $labels.instance }}"
|
|
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
|
|
```
|
|
|
|
## Inspecting alerts during runtime
|
|
|
|
To manually inspect which alerts are active (pending or firing), navigate to
|
|
the "Alerts" tab of your Prometheus instance. This will show you the exact
|
|
label sets for which each defined alert is currently active.
|
|
|
|
For pending and firing alerts, Prometheus also stores synthetic time series of
|
|
the form `ALERTS{alertname="<alert name>", alertstate="<pending or firing>", <additional alert labels>}`.
|
|
The sample value is set to `1` as long as the alert is in the indicated active
|
|
(pending or firing) state, and the series is marked stale when this is no
|
|
longer the case.
|
|
|
|
## Sending alert notifications
|
|
|
|
Prometheus's alerting rules are good at figuring what is broken *right now*, but
|
|
they are not a fully-fledged notification solution. Another layer is needed to
|
|
add summarization, notification rate limiting, silencing and alert dependencies
|
|
on top of the simple alert definitions. In Prometheus's ecosystem, the
|
|
[Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) takes on this
|
|
role. Thus, Prometheus may be configured to periodically send information about
|
|
alert states to an Alertmanager instance, which then takes care of dispatching
|
|
the right notifications.
|
|
Prometheus can be [configured](configuration.md) to automatically discover available
|
|
Alertmanager instances through its service discovery integrations.
|