Clean up codeboxes and headings in docs

The new docs site will have syntax highlighting, so this adds language tags
to code boxes that are currently missing them. I didn't add `promql` as a
language yet since the highlighter doesn't support it yet, plus a lot of
the PromQL codeboxes in our docs aren't strictly valid PromQL, they are
more like multiple expressions listed in the same code box on multiple
lines. So I'm leaving that for sometime later.

In the HTTP API page, I moved the curl examples from the JSON codeboxes to
their own ones above the JSON output. I considered putting an "Output:"
text between the curl + JSON output, but I think the way it currently looks
without it is probably fine.

I also fixed a number of headings which were at the wrong level relative to
their nesting in the document.

I also removed `go` as a language from the Go template language examples,
because the Go template language isn't Go at all.

I also adjusted the indent on one codebox to be more reasonable (2 spaces
instead of 8).

And then finally, my editor made a bunch of whitespace changes
automatically, like removing trailing spaces.

Signed-off-by: Julius Volz <julius.volz@gmail.com>

Signed-off-by: Julius Volz <julius.volz@gmail.com>
This commit is contained in:
Julius Volz 2025-05-13 15:37:57 +02:00
parent dbf5d01a62
commit 1b818b03d5
11 changed files with 228 additions and 145 deletions

View File

@ -11,7 +11,7 @@ to an external service. Whenever the alert expression results in one or more
vector elements at a given point in time, the alert counts as active for these
elements' label sets.
### Defining alerting rules
## Defining alerting rules
Alerting rules are configured in Prometheus in the same way as [recording
rules](recording_rules.md).
@ -54,7 +54,7 @@ values can be templated.
The `annotations` clause specifies a set of informational labels that can be used to store longer additional information such as alert descriptions or runbook links. The annotation values can be templated.
#### Templating
### Templating
Label and annotation values can be templated using [console
templates](https://prometheus.io/docs/visualization/consoles). The `$labels`
@ -93,7 +93,7 @@ groups:
description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"
```
### Inspecting alerts during runtime
## Inspecting alerts during runtime
To manually inspect which alerts are active (pending or firing), navigate to
the "Alerts" tab of your Prometheus instance. This will show you the exact
@ -105,7 +105,7 @@ The sample value is set to `1` as long as the alert is in the indicated active
(pending or firing) state, and the series is marked stale when this is no
longer the case.
### Sending alert notifications
## Sending alert notifications
Prometheus's alerting rules are good at figuring what is broken *right now*, but
they are not a fully-fledged notification solution. Another layer is needed to
@ -114,6 +114,6 @@ on top of the simple alert definitions. In Prometheus's ecosystem, the
[Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) takes on this
role. Thus, Prometheus may be configured to periodically send information about
alert states to an Alertmanager instance, which then takes care of dispatching
the right notifications.
the right notifications.
Prometheus can be [configured](configuration.md) to automatically discover available
Alertmanager instances through its service discovery integrations.

View File

@ -80,9 +80,9 @@ global:
[ rule_query_offset: <duration> | default = 0s ]
# The labels to add to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
# Environment variable references `${var}` or `$var` are replaced according
# to the values of the current environment variables.
# external systems (federation, remote storage, Alertmanager).
# Environment variable references `${var}` or `$var` are replaced according
# to the values of the current environment variables.
# References to undefined variables are replaced by the empty string.
# The `$` character can be escaped by using `$$`.
external_labels:
@ -195,7 +195,7 @@ otlp:
# It preserves all special character like dots and won't append special suffixes for metric
# unit and type.
#
# WARNING: The "NoTranslation" setting has significant known risks and limitations (see https://prometheus.io/docs/practices/naming/
# WARNING: The "NoTranslation" setting has significant known risks and limitations (see https://prometheus.io/docs/practices/naming/
# for details):
# * Impaired UX when using PromQL in plain YAML (e.g. alerts, rules, dashboard, autoscaling configuration).
# * Series collisions which in the best case may result in OOO errors, in the worst case a silently malformed
@ -484,21 +484,21 @@ metric_relabel_configs:
# that will be kept in memory. 0 means no limit.
[ keep_dropped_targets: <int> | default = 0 ]
# Specifies the validation scheme for metric and label names. Either blank or
# Specifies the validation scheme for metric and label names. Either blank or
# "utf8" for full UTF-8 support, or "legacy" for letters, numbers, colons, and
# underscores.
[ metric_name_validation_scheme: <string> | default "utf8" ]
# Specifies the character escaping scheme that will be requested when scraping
# for metric and label names that do not conform to the legacy Prometheus
# character set. Available options are:
# character set. Available options are:
# * `allow-utf-8`: Full UTF-8 support, no escaping needed.
# * `underscores`: Escape all legacy-invalid characters to underscores.
# * `dots`: Escapes dots to `_dot_`, underscores to `__`, and all other
# legacy-invalid characters to underscores.
# * `values`: Prepend the name with `U__` and replace all invalid
# characters with their unicode value, surrounded by underscores. Single
# underscores are replaced with double underscores.
# underscores are replaced with double underscores.
# e.g. "U__my_2e_dotted_2e_name".
# If this value is left blank, Prometheus will default to `allow-utf-8` if the
# validation scheme for the current scrape config is set to utf8, or
@ -517,7 +517,7 @@ metric_relabel_configs:
# reduced as much as possible until it is within the limit.
# To set an upper limit for the schema (equivalent to "scale" in OTel's
# exponential histograms), use the following factor limits:
#
#
# +----------------------------+----------------------------+
# | growth factor | resulting schema AKA scale |
# +----------------------------+----------------------------+
@ -547,7 +547,7 @@ metric_relabel_configs:
# +----------------------------+----------------------------+
# | 1.002 | 8 |
# +----------------------------+----------------------------+
#
#
# 0 results in the smallest supported factor (which is currently ~1.0027 or
# schema 8, but might change in the future).
[ native_histogram_min_bucket_factor: <float> | default = 0 ]
@ -564,7 +564,7 @@ Where `<job_name>` must be unique across all scrape configurations.
A `http_config` allows configuring HTTP requests.
```
```yaml
# Sets the `Authorization` header on every request with the
# configured username and password.
# username and username_file are mutually exclusive.
@ -795,7 +795,7 @@ The following meta labels are available on targets during [relabeling](#relabel_
* `__meta_consul_address`: the address of the target
* `__meta_consul_dc`: the datacenter name for the target
* `__meta_consul_health`: the health status of the service
* `__meta_consul_partition`: the admin partition name where the service is registered
* `__meta_consul_partition`: the admin partition name where the service is registered
* `__meta_consul_metadata_<key>`: each node metadata key value of the target
* `__meta_consul_node`: the node name defined for the target
* `__meta_consul_service_address`: the service address of the target
@ -942,7 +942,7 @@ host: <string>
[ host_networking_host: <string> | default = "localhost" ]
# Sort all non-nil networks in ascending order based on network name and
# get the first network if the container has multiple networks defined,
# get the first network if the container has multiple networks defined,
# thus avoiding collecting duplicate targets.
[ match_first_network: <boolean> | default = true ]
@ -1258,7 +1258,7 @@ The following meta labels are available on targets during [relabeling](#relabel_
#### `loadbalancer`
The `loadbalancer` role discovers one target per Octavia loadbalancer with a
The `loadbalancer` role discovers one target per Octavia loadbalancer with a
`PROMETHEUS` listener. The target address defaults to the VIP address
of the load balancer.
@ -1471,7 +1471,7 @@ and serves as an interface to plug in custom service discovery mechanisms.
It reads a set of files containing a list of zero or more
`<static_config>`s. Changes to all defined files are detected via disk watches
and applied immediately.
and applied immediately.
While those individual files are watched for changes,
the parent directory is also watched implicitly. This is to handle [atomic
@ -1984,7 +1984,7 @@ See below for the configuration options for Kuma MonitoringAssignment discovery:
# Address of the Kuma Control Plane's MADS xDS server.
server: <string>
# Client id is used by Kuma Control Plane to compute Monitoring Assignment for specific Prometheus backend.
# Client id is used by Kuma Control Plane to compute Monitoring Assignment for specific Prometheus backend.
# This is useful when migrating between multiple Prometheus backends, or having separate backend for each Mesh.
# When not specified, system hostname/fqdn will be used if available, if not `prometheus` will be used.
[ client_id: <string> ]
@ -2082,7 +2082,7 @@ The following meta labels are available on targets during [relabeling](#relabel_
* `__meta_linode_status`: the status of the linode instance
* `__meta_linode_tags`: a list of tags of the linode instance joined by the tag separator
* `__meta_linode_group`: the display group a linode instance is a member of
* `__meta_linode_gpus`: the number of GPU's of the linode instance
* `__meta_linode_gpus`: the number of GPU's of the linode instance
* `__meta_linode_hypervisor`: the virtualization software powering the linode instance
* `__meta_linode_backups`: the backup service status of the linode instance
* `__meta_linode_specs_disk_bytes`: the amount of storage space the linode instance has access to
@ -2603,7 +2603,7 @@ input to a subsequent relabeling step), use the `__tmp` label name prefix. This
prefix is guaranteed to never be used by Prometheus itself.
```yaml
# The source_labels tells the rule what labels to fetch from the series. Any
# The source_labels tells the rule what labels to fetch from the series. Any
# labels which do not exist get a blank value (""). Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
@ -2894,7 +2894,7 @@ write_relabel_configs:
# For the `io.prometheus.write.v2.Request` message, this option is noop (always true).
[ send_native_histograms: <boolean> | default = false ]
# When enabled, remote-write will resolve the URL host name via DNS, choose one of the IP addresses at random, and connect to it.
# When enabled, remote-write will resolve the URL host name via DNS, choose one of the IP addresses at random, and connect to it.
# When disabled, remote-write relies on Go's standard behavior, which is to try to connect to each address in turn.
# The connection timeout applies to the whole operation, i.e. in the latter case it is spread over all attempt.
# This is an experimental feature, and its behavior might still change, or even get removed.
@ -2927,7 +2927,7 @@ azuread:
# Azure User-assigned Managed identity.
[ managed_identity:
[ client_id: <string> ] ]
[ client_id: <string> ] ]
# Azure OAuth.
[ oauth:
@ -3055,8 +3055,8 @@ with this feature.
# that is within the out-of-order window, or (b) too-old, i.e. not in-order
# and before the out-of-order window.
#
# When out_of_order_time_window is greater than 0, it also affects experimental agent. It allows
# the agent's WAL to accept out-of-order samples that fall within the specified time window relative
# When out_of_order_time_window is greater than 0, it also affects experimental agent. It allows
# the agent's WAL to accept out-of-order samples that fall within the specified time window relative
# to the timestamp of the last appended sample for the same series.
[ out_of_order_time_window: <duration> | default = 0s ]
```

View File

@ -27,7 +27,7 @@ Generic placeholders are defined as follows:
A valid example file can be found [here](/documentation/examples/web-config.yml).
```
```yaml
tls_server_config:
# Certificate and key files for server to use to authenticate to client.
cert_file: <filename>

View File

@ -34,7 +34,7 @@ When the file is syntactically valid, the checker prints a textual
representation of the parsed rules to standard output and then exits with
a `0` return status.
If there are any syntax errors or invalid input arguments, it prints an error
If there are any syntax errors or invalid input arguments, it prints an error
message to standard error and exits with a `1` return status.
## Recording rules
@ -71,7 +71,8 @@ groups:
```
### `<rule_group>`
```
```yaml
# The name of the group. Must be unique within a file.
name: <string>
@ -98,7 +99,7 @@ rules:
The syntax for recording rules is:
```
```yaml
# The name of the time series to output to. Must be a valid metric name.
record: <string>
@ -114,7 +115,7 @@ labels:
The syntax for alerting rules is:
```
```yaml
# The name of the alert. Must be a valid label value.
alert: <string>
@ -143,7 +144,7 @@ annotations:
See also the
[best practices for naming metrics created by recording rules](https://prometheus.io/docs/practices/rules/#recording-rules).
# Limiting alerts and series
## Limiting alerts and series
A limit for alerts produced by alerting rules and series produced recording rules
can be configured per-group. When the limit is exceeded, _all_ series produced
@ -152,9 +153,9 @@ the rule, active, pending, or inactive, are cleared as well. The event will be
recorded as an error in the evaluation, and as such no stale markers are
written.
# Rule query offset
## Rule query offset
This is useful to ensure the underlying metrics have been received and stored in Prometheus. Metric availability delays are more likely to occur when Prometheus is running as a remote write target due to the nature of distributed systems, but can also occur when there's anomalies with scraping and/or short evaluation intervals.
# Failed rule evaluations due to slow evaluation
## Failed rule evaluations due to slow evaluation
If a rule group hasn't finished evaluating before its next evaluation is supposed to start (as defined by the `evaluation_interval`), the next evaluation will be skipped. Subsequent evaluations of the rule group will continue to be skipped until the initial evaluation either completes or times out. When this happens, there will be a gap in the metric produced by the recording rule. The `rule_group_iterations_missed_total` metric will be incremented for each missed iteration of the rule group.
If a rule group hasn't finished evaluating before its next evaluation is supposed to start (as defined by the `evaluation_interval`), the next evaluation will be skipped. Subsequent evaluations of the rule group will continue to be skipped until the initial evaluation either completes or times out. When this happens, there will be a gap in the metric produced by the recording rule. The `rule_group_iterations_missed_total` metric will be incremented for each missed iteration of the rule group.

View File

@ -13,7 +13,7 @@ templating](https://golang.org/pkg/text/template/) system.
## Simple alert field templates
```
```yaml
alert: InstanceDown
expr: up == 0
for: 5m
@ -33,7 +33,7 @@ console instead.
This displays a list of instances, and whether they are up:
```go
```
{{ range query "up" }}
{{ .Labels.instance }} {{ .Value }}
{{ end }}
@ -43,7 +43,7 @@ The special `.` variable contains the value of the current sample for each loop
## Display one value
```go
```
{{ with query "some_metric{instance='someinstance'}" }}
{{ . | first | value | humanize }}
{{ end }}
@ -58,7 +58,7 @@ formatting of results, and linking to the [expression browser](https://prometheu
## Using console URL parameters
```go
```
{{ with printf "node_memory_MemTotal{job='node',instance='%s'}" .Params.instance | query }}
{{ . | first | value | humanize1024 }}B
{{ end }}
@ -95,7 +95,7 @@ powerful when combined with
[console library](template_reference.md#console-templates) support, allowing
sharing of templates across consoles.
```go
```
{{/* Define the template */}}
{{define "myTemplate"}}
do something
@ -107,7 +107,7 @@ sharing of templates across consoles.
Templates are limited to one argument. The `args` function can be used to wrap multiple arguments.
```go
```
{{define "myMultiArgTemplate"}}
First argument: {{.arg0}}
Second argument: {{.arg1}}

View File

@ -17,8 +17,8 @@ The primary data structure for dealing with time series data is the sample, defi
```go
type sample struct {
Labels map[string]string
Value interface{}
Labels map[string]string
Value interface{}
}
```

View File

@ -23,7 +23,7 @@ Exemplar storage is implemented as a fixed size circular buffer that stores exem
`--enable-feature=memory-snapshot-on-shutdown`
This takes a snapshot of the chunks that are in memory along with the series information when shutting down and stores it on disk. This will reduce the startup time since the memory state can now be restored with this snapshot
This takes a snapshot of the chunks that are in memory along with the series information when shutting down and stores it on disk. This will reduce the startup time since the memory state can now be restored with this snapshot
and m-mapped chunks, while a WAL replay from disk is only needed for the parts of the WAL that are not part of the snapshot.
## Extra scrape metrics
@ -183,7 +183,7 @@ This state is periodically ([`max_stale`][d2c]) cleared of inactive series.
Enabling this _can_ have negative impact on performance, because the in-memory
state is mutex guarded. Cumulative-only OTLP requests are not affected.
### PromQL arithmetic expressions in time durations
## PromQL arithmetic expressions in time durations
`--enable-feature=promql-duration-expr`
@ -203,7 +203,7 @@ The following operators are supported:
* `+` - addition
* `-` - subtraction
* `*` - multiplication
* `*` - multiplication
* `/` - division
* `%` - modulo
* `^` - exponentiation
@ -227,7 +227,7 @@ When enabled, allows for the native ingestion of delta OTLP metrics, storing the
Currently, the StartTimeUnixNano field is ignored, and deltas are given the unknown metric metadata type.
Delta support is in a very early stage of development and the ingestion and querying process my change over time. For the open proposal see [prometheus/proposals#48](https://github.com/prometheus/proposals/pull/48).
Delta support is in a very early stage of development and the ingestion and querying process my change over time. For the open proposal see [prometheus/proposals#48](https://github.com/prometheus/proposals/pull/48).
### Querying
@ -246,4 +246,4 @@ These may not work well if the `<range>` is not a multiple of the collection int
* It is difficult to figure out whether a metric has delta or cumulative temporality, since there's no indication of temporality in metric names or labels. For now, if you are ingesting a mix of delta and cumulative metrics we advise you to explicitly add your own labels to distinguish them. In the future, we plan to introduce type labels to consistently distinguish metric types and potentially make PromQL functions type-aware (e.g. providing warnings when cumulative-only functions are used with delta metrics).
* If there are multiple samples being ingested at the same timestamp, only one of the points is kept - the samples are **not** summed together (this is how Prometheus works in general - duplicate timestamp samples are rejected). Any aggregation will have to be done before sending samples to Prometheus.
* If there are multiple samples being ingested at the same timestamp, only one of the points is kept - the samples are **not** summed together (this is how Prometheus works in general - duplicate timestamp samples are rejected). Any aggregation will have to be done before sending samples to Prometheus.

View File

@ -200,7 +200,7 @@ To record the time series resulting from this expression into a new metric
called `job_instance_mode:node_cpu_seconds:avg_rate5m`, create a file
with the following recording rule and save it as `prometheus.rules.yml`:
```
```yaml
groups:
- name: cpu-node
rules:

View File

@ -11,52 +11,52 @@ This document offers guidance on migrating from Prometheus 2.x to Prometheus 3.0
## Flags
- The following feature flags have been removed and they have been added to the
- The following feature flags have been removed and they have been added to the
default behavior of Prometheus v3:
- `promql-at-modifier`
- `promql-negative-offset`
- `new-service-discovery-manager`
- `expand-external-labels`
- Environment variable references `${var}` or `$var` in external label values
are replaced according to the values of the current environment variables.
- Environment variable references `${var}` or `$var` in external label values
are replaced according to the values of the current environment variables.
- References to undefined variables are replaced by the empty string.
The `$` character can be escaped by using `$$`.
- `no-default-scrape-port`
- Prometheus v3 will no longer add ports to scrape targets according to the
- Prometheus v3 will no longer add ports to scrape targets according to the
specified scheme. Target will now appear in labels as configured.
- If you rely on scrape targets like
`https://example.com/metrics` or `http://example.com/metrics` to be
represented as `https://example.com/metrics:443` and
- If you rely on scrape targets like
`https://example.com/metrics` or `http://example.com/metrics` to be
represented as `https://example.com/metrics:443` and
`http://example.com/metrics:80` respectively, add them to your target URLs
- `agent`
- Instead use the dedicated `--agent` CLI flag.
- `remote-write-receiver`
- Instead use the dedicated `--web.enable-remote-write-receiver` CLI flag to enable the remote write receiver.
- `auto-gomemlimit`
- Prometheus v3 will automatically set `GOMEMLIMIT` to match the Linux
container memory limit. If there is no container limit, or the process is
running outside of containers, the system memory total is used. To disable
- Prometheus v3 will automatically set `GOMEMLIMIT` to match the Linux
container memory limit. If there is no container limit, or the process is
running outside of containers, the system memory total is used. To disable
this, `--no-auto-gomemlimit` is available.
- `auto-gomaxprocs`
- Prometheus v3 will automatically set `GOMAXPROCS` to match the Linux
- Prometheus v3 will automatically set `GOMAXPROCS` to match the Linux
container CPU quota. To disable this, `--no-auto-gomaxprocs` is available.
Prometheus v3 will log a warning if you continue to pass these to
Prometheus v3 will log a warning if you continue to pass these to
`--enable-feature`.
## Configuration
- The scrape job level configuration option `scrape_classic_histograms` has been
renamed to `always_scrape_classic_histograms`. If you use the
`--enable-feature=native-histograms` feature flag to ingest native histograms
and you also want to ingest classic histograms that an endpoint might expose
along with native histograms, be sure to add this configuration or change your
- The scrape job level configuration option `scrape_classic_histograms` has been
renamed to `always_scrape_classic_histograms`. If you use the
`--enable-feature=native-histograms` feature flag to ingest native histograms
and you also want to ingest classic histograms that an endpoint might expose
along with native histograms, be sure to add this configuration or change your
configuration from the old name.
- The `http_config.enable_http2` in `remote_write` items default has been
changed to `false`. In Prometheus v2 the remote write http client would
default to use http2. In order to parallelize multiple remote write queues
- The `http_config.enable_http2` in `remote_write` items default has been
changed to `false`. In Prometheus v2 the remote write http client would
default to use http2. In order to parallelize multiple remote write queues
across multiple sockets its preferable to not default to http2.
If you prefer to use http2 for remote write you must now set
If you prefer to use http2 for remote write you must now set
`http_config.enable_http2: true` in your `remote_write` configuration section.
## PromQL
@ -137,7 +137,7 @@ may now fail if this fallback protocol is not specified.
### TSDB format and downgrade
The TSDB format has been changed slightly in Prometheus v2.55 in preparation for changes
The TSDB format has been changed slightly in Prometheus v2.55 in preparation for changes
to the index format. Consequently, a Prometheus v3 TSDB can only be read by a
Prometheus v2.55 or newer. Keep that in mind when upgrading to v3 -- you will be only
able to downgrade to v2.55, not lower, without losing your TSDB persistent data.
@ -147,8 +147,8 @@ confirm Prometheus works as expected, before upgrading to v3.
### TSDB storage contract
TSDB compatible storage is now expected to return results matching the specified
selectors. This might impact some third party implementations, most likely
TSDB compatible storage is now expected to return results matching the specified
selectors. This might impact some third party implementations, most likely
implementing `remote_read`.
This contract is not explicitly enforced, but can cause undefined behavior.
@ -179,7 +179,7 @@ scrape_configs:
```
### Log message format
Prometheus v3 has adopted `log/slog` over the previous `go-kit/log`. This
Prometheus v3 has adopted `log/slog` over the previous `go-kit/log`. This
results in a change of log message format. An example of the old log format is:
```
@ -198,19 +198,19 @@ time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.co
```
### `le` and `quantile` label values
In Prometheus v3, the values of the `le` label of classic histograms and the
In Prometheus v3, the values of the `le` label of classic histograms and the
`quantile` label of summaries are normalized upon ingestion. In Prometheus v2
the value of these labels depended on the scrape protocol (protobuf vs text
format) in some situations. This led to label values changing based on the
scrape protocol. E.g. a metric exposed as `my_classic_hist{le="1"}` would be
ingested as `my_classic_hist{le="1"}` via the text format, but as
`my_classic_hist{le="1.0"}` via protobuf. This changed the identity of the
the value of these labels depended on the scrape protocol (protobuf vs text
format) in some situations. This led to label values changing based on the
scrape protocol. E.g. a metric exposed as `my_classic_hist{le="1"}` would be
ingested as `my_classic_hist{le="1"}` via the text format, but as
`my_classic_hist{le="1.0"}` via protobuf. This changed the identity of the
metric and caused problems when querying the metric.
In Prometheus v3 these label values will always be normalized to a float like
representation. I.e. the above example will always result in
`my_classic_hist{le="1.0"}` being ingested into prometheus, no matter via which
protocol. The effect of this change is that alerts, recording rules and
dashboards that directly reference label values as whole numbers such as
In Prometheus v3 these label values will always be normalized to a float like
representation. I.e. the above example will always result in
`my_classic_hist{le="1.0"}` being ingested into prometheus, no matter via which
protocol. The effect of this change is that alerts, recording rules and
dashboards that directly reference label values as whole numbers such as
`le="1"` will stop working.
Ways to deal with this change either globally or on a per metric basis:
@ -236,11 +236,11 @@ This should **only** be applied to metrics that currently produce such labels.
```
### Disallow configuring Alertmanager with the v1 API
Prometheus 3 no longer supports Alertmanager's v1 API. Effectively Prometheus 3
Prometheus 3 no longer supports Alertmanager's v1 API. Effectively Prometheus 3
requires [Alertmanager 0.16.0](https://github.com/prometheus/alertmanager/releases/tag/v0.16.0) or later. Users with older Alertmanager
versions or configurations that use `alerting: alertmanagers: [api_version: v1]`
versions or configurations that use `alerting: alertmanagers: [api_version: v1]`
need to upgrade Alertmanager and change their configuration to use `api_version: v2`.
# Prometheus 2.0 migration guide
## Prometheus 2.0 migration guide
For the Prometheus 1.8 to 2.0 please refer to the [Prometheus v2.55 documentation](https://prometheus.io/docs/prometheus/2.55/migration/).
For the migration guide from Prometheus 1.8 to 2.0 please refer to the [Prometheus v2.55 documentation](https://prometheus.io/docs/prometheus/2.55/migration/).

View File

@ -32,7 +32,7 @@ will be returned in the data field.
The JSON response envelope format is as follows:
```
```json
{
"status": "success" | "error",
"data": <data>,
@ -96,7 +96,7 @@ query that may breach server-side URL character limits.
The `data` section of the query result has the following format:
```
```json
{
"resultType": "matrix" | "vector" | "scalar" | "string",
"result": <value>
@ -110,8 +110,11 @@ formats](#expression-query-result-formats).
The following example evaluates the expression `up` at the time
`2015-07-01T20:10:51.781Z`:
```bash
curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z'
```
```json
$ curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z'
{
"status" : "success",
"data" : {
@ -163,7 +166,7 @@ query that may breach server-side URL character limits.
The `data` section of the query result has the following format:
```
```json
{
"resultType": "matrix",
"result": <value>
@ -176,8 +179,11 @@ format](#range-vectors).
The following example evaluates the expression `up` over a 30-second range with
a query resolution of 15 seconds.
```bash
curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s'
```
```json
$ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s'
{
"status" : "success",
"data" : {
@ -233,8 +239,11 @@ The `data` section of the query result is a string containing the formatted quer
The following example formats the expression `foo/bar`:
```bash
curl 'http://localhost:9090/api/v1/format_query?query=foo/bar'
```
```json
$ curl 'http://localhost:9090/api/v1/format_query?query=foo/bar'
{
"status" : "success",
"data" : "foo / bar"
@ -264,8 +273,11 @@ The `data` section of the query result is a string containing the AST of the par
The following example parses the expression `foo/bar`:
```bash
curl 'http://localhost:9090/api/v1/parse_query?query=foo/bar'
```
```json
$ curl 'http://localhost:9090/api/v1/parse_query?query=foo/bar'
{
"data" : {
"bool" : false,
@ -343,8 +355,11 @@ contain the label name/value pairs which identify each series.
The following example returns all series that match either of the selectors
`up` or `process_start_time_seconds{job="prometheus"}`:
```bash
curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=up' --data-urlencode 'match[]=process_start_time_seconds{job="prometheus"}'
```
```json
$ curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=up' --data-urlencode 'match[]=process_start_time_seconds{job="prometheus"}'
{
"status" : "success",
"data" : [
@ -389,8 +404,11 @@ The `data` section of the JSON response is a list of string label names.
Here is an example.
```bash
curl 'localhost:9090/api/v1/labels'
```
```json
$ curl 'localhost:9090/api/v1/labels'
{
"status": "success",
"data": [
@ -439,8 +457,11 @@ The `data` section of the JSON response is a list of string label values.
This example queries for all label values for the `http_status_code` label:
```bash
curl http://localhost:9090/api/v1/label/http_status_code/values
```
```json
$ curl http://localhost:9090/api/v1/label/http_status_code/values
{
"status" : "success",
"data" : [
@ -462,8 +483,11 @@ Label names can optionally be encoded using the Values Escaping method, and is n
This example queries for all label values for the `http.status_code` label:
```bash
curl http://localhost:9090/api/v1/label/U__http_2e_status_code/values
```
```json
$ curl http://localhost:9090/api/v1/label/U__http_2e_status_code/values
{
"status" : "success",
"data" : [
@ -489,8 +513,11 @@ URL query parameters:
- `start=<rfc3339 | unix_timestamp>`: Start timestamp.
- `end=<rfc3339 | unix_timestamp>`: End timestamp.
```bash
curl -g 'http://localhost:9090/api/v1/query_exemplars?query=test_exemplar_metric_total&start=2020-09-14T15:22:25.479Z&end=2020-09-14T15:23:25.479Z'
```
```json
$ curl -g 'http://localhost:9090/api/v1/query_exemplars?query=test_exemplar_metric_total&start=2020-09-14T15:22:25.479Z&end=2020-09-14T15:23:25.479Z'
{
"status": "success",
"data": [
@ -556,7 +583,7 @@ is explained in detail in its own section below.
Range vectors are returned as result type `matrix`. The corresponding
`result` property has the following format:
```
```json
[
{
"metric": { "<label_name>": "<label_value>", ... },
@ -578,7 +605,7 @@ and [`sort_by_label`](functions.md#sort_by_label) have no effect for range vecto
Instant vectors are returned as result type `vector`. The corresponding
`result` property has the following format:
```
```json
[
{
"metric": { "<label_name>": "<label_value>", ... },
@ -600,7 +627,7 @@ is used.
Scalar results are returned as result type `scalar`. The corresponding
`result` property has the following format:
```
```json
[ <unix_time>, "<scalar_value>" ]
```
@ -609,7 +636,7 @@ Scalar results are returned as result type `scalar`. The corresponding
String results are returned as result type `string`. The corresponding
`result` property has the following format:
```
```json
[ <unix_time>, "<string_value>" ]
```
@ -620,7 +647,7 @@ The `<histogram>` placeholder used above is formatted as follows.
_Note that native histograms are an experimental feature, and the format below
might still change._
```
```json
{
"count": "<count_of_observations>",
"sum": "<sum_of_observations>",
@ -654,8 +681,11 @@ Dropped targets are subject to `keep_dropped_targets` limit, if set.
`labels` represents the label set after relabeling has occurred.
`discoveredLabels` represent the unmodified labels retrieved during service discovery before relabeling has occurred.
```bash
curl http://localhost:9090/api/v1/targets
```
```json
$ curl http://localhost:9090/api/v1/targets
{
"status": "success",
"data": {
@ -704,9 +734,12 @@ The `state` query parameter allows the caller to filter by active or dropped tar
Note that an empty array is still returned for targets that are filtered out.
Other values are ignored.
```bash
curl 'http://localhost:9090/api/v1/targets?state=active'
```
```json
$ curl 'http://localhost:9090/api/v1/targets?state=active'
{
"status": "success",
"data": {
"activeTargets": [
@ -737,9 +770,12 @@ $ curl 'http://localhost:9090/api/v1/targets?state=active'
The `scrapePool` query parameter allows the caller to filter by scrape pool name.
```bash
curl 'http://localhost:9090/api/v1/targets?scrapePool=node_exporter'
```
```json
$ curl 'http://localhost:9090/api/v1/targets?scrapePool=node_exporter'
{
"status": "success",
"data": {
"activeTargets": [
@ -792,9 +828,11 @@ URL query parameters:
- `group_limit=<number>`: The `group_limit` parameter allows you to specify a limit for the number of rule groups that is returned in a single response. If the total number of rule groups exceeds the specified `group_limit` value, the response will include a `groupNextToken` property. You can use the value of this `groupNextToken` property in subsequent requests in the `group_next_token` parameter to paginate over the remaining rule groups. The `groupNextToken` property will not be present in the final response, indicating that you have retrieved all the available rule groups. Please note that there are no guarantees regarding the consistency of the response if the rule groups are being modified during the pagination process.
- `group_next_token`: the pagination token that was returned in previous request when the `group_limit` property is set. The pagination token is used to iteratively paginate over a large number of rule groups. To use the `group_next_token` parameter, the `group_limit` parameter also need to be present. If a rule group that coincides with the next token is removed while you are paginating over the rule groups, a response with status code 400 will be returned.
```json
$ curl http://localhost:9090/api/v1/rules
```bash
curl http://localhost:9090/api/v1/rules
```
```json
{
"data": {
"groups": [
@ -857,9 +895,11 @@ guarantees as the overarching API v1.
GET /api/v1/alerts
```
```json
$ curl http://localhost:9090/api/v1/alerts
```bash
curl http://localhost:9090/api/v1/alerts
```
```json
{
"data": {
"alerts": [
@ -904,6 +944,9 @@ curl -G http://localhost:9091/api/v1/targets/metadata \
--data-urlencode 'metric=go_goroutines' \
--data-urlencode 'match_target={job="prometheus"}' \
--data-urlencode 'limit=2'
```
```json
{
"status": "success",
"data": [
@ -932,9 +975,12 @@ curl -G http://localhost:9091/api/v1/targets/metadata \
The following example returns metadata for all metrics for all targets with
label `instance="127.0.0.1:9090"`.
```json
```bash
curl -G http://localhost:9091/api/v1/targets/metadata \
--data-urlencode 'match_target={instance="127.0.0.1:9090"}'
```
```json
{
"status": "success",
"data": [
@ -983,9 +1029,11 @@ The `data` section of the query result consists of an object where each key is a
The following example returns two metrics. Note that the metric `http_requests_total` has more than one object in the list. At least one target has a value for `HELP` that do not match with the rest.
```json
```bash
curl -G http://localhost:9090/api/v1/metadata?limit=2
```
```json
{
"status": "success",
"data": {
@ -1014,9 +1062,11 @@ curl -G http://localhost:9090/api/v1/metadata?limit=2
The following example returns only one metadata entry for each metric.
```json
```bash
curl -G http://localhost:9090/api/v1/metadata?limit_per_metric=1
```
```json
{
"status": "success",
"data": {
@ -1040,9 +1090,11 @@ curl -G http://localhost:9090/api/v1/metadata?limit_per_metric=1
The following example returns metadata only for the metric `http_requests_total`.
```json
```bash
curl -G http://localhost:9090/api/v1/metadata?metric=http_requests_total
```
```json
{
"status": "success",
"data": {
@ -1073,8 +1125,11 @@ GET /api/v1/alertmanagers
Both the active and dropped Alertmanagers are part of the response.
```bash
curl http://localhost:9090/api/v1/alertmanagers
```
```json
$ curl http://localhost:9090/api/v1/alertmanagers
{
"status": "success",
"data": {
@ -1107,8 +1162,11 @@ GET /api/v1/status/config
The config is returned as dumped YAML file. Due to limitation of the YAML
library, YAML comments are not included.
```bash
curl http://localhost:9090/api/v1/status/config
```
```json
$ curl http://localhost:9090/api/v1/status/config
{
"status": "success",
"data": {
@ -1127,8 +1185,11 @@ GET /api/v1/status/flags
All values are of the result type `string`.
```bash
curl http://localhost:9090/api/v1/status/flags
```
```json
$ curl http://localhost:9090/api/v1/status/flags
{
"status": "success",
"data": {
@ -1154,8 +1215,11 @@ GET /api/v1/status/runtimeinfo
The returned values are of different types, depending on the nature of the runtime property.
```bash
curl http://localhost:9090/api/v1/status/runtimeinfo
```
```json
$ curl http://localhost:9090/api/v1/status/runtimeinfo
{
"status": "success",
"data": {
@ -1190,8 +1254,11 @@ GET /api/v1/status/buildinfo
All values are of the result type `string`.
```bash
curl http://localhost:9090/api/v1/status/buildinfo
```
```json
$ curl http://localhost:9090/api/v1/status/buildinfo
{
"status": "success",
"data": {
@ -1232,8 +1299,11 @@ The `data` section of the query result consists of:
- **memoryInBytesByLabelName** This will provide a list of the label names and memory used in bytes. Memory usage is calculated by adding the length of all values for a given label name.
- **seriesCountByLabelPair** This will provide a list of label value pairs and their series count.
```bash
curl http://localhost:9090/api/v1/status/tsdb
```
```json
$ curl http://localhost:9090/api/v1/status/tsdb
{
"status": "success",
"data": {
@ -1305,8 +1375,11 @@ GET /api/v1/status/walreplay
- **in progress**: The replay is in progress.
- **done**: The replay has finished.
```bash
curl http://localhost:9090/api/v1/status/walreplay
```
```json
$ curl http://localhost:9090/api/v1/status/walreplay
{
"status": "success",
"data": {
@ -1338,8 +1411,11 @@ URL query parameters:
- `skip_head=<bool>`: Skip data present in the head block. Optional.
```bash
curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot
```
```json
$ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot
{
"status": "success",
"data": {
@ -1371,8 +1447,8 @@ Not mentioning both start and end times would clear all the data for the matched
Example:
```json
$ curl -X POST \
```bash
curl -X POST \
-g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}'
```
@ -1392,8 +1468,8 @@ PUT /api/v1/admin/tsdb/clean_tombstones
This takes no parameters or body.
```json
$ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones
```bash
curl -XPOST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones
```
*New in v2.1 and supports PUT from v2.9*
@ -1451,8 +1527,11 @@ GET /api/v1/notifications
Example:
```bash
curl http://localhost:9090/api/v1/notifications
```
```
$ curl http://localhost:9090/api/v1/notifications
{
"status": "success",
"data": [
@ -1477,8 +1556,11 @@ GET /api/v1/notifications/live
Example:
```bash
curl http://localhost:9090/api/v1/notifications/live
```
```
$ curl http://localhost:9090/api/v1/notifications/live
data: {
"status": "success",
"data": [

View File

@ -61,10 +61,10 @@ A Prometheus server's data directory looks something like this:
Note that a limitation of local storage is that it is not clustered or
replicated. Thus, it is not arbitrarily scalable or durable in the face of
drive or node outages and should be managed like any other single node
database.
database.
[Snapshots](querying/api.md#snapshot) are recommended for backups. Backups
made without snapshots run the risk of losing data that was recorded since
[Snapshots](querying/api.md#snapshot) are recommended for backups. Backups
made without snapshots run the risk of losing data that was recorded since
the last WAL sync, which typically happens every two hours. With proper
architecture, it is possible to retain years of data in local storage.
@ -75,14 +75,14 @@ performance, and efficiency.
For further details on file format, see [TSDB format](/tsdb/docs/format/README.md).
## Compaction
### Compaction
The initial two-hour blocks are eventually compacted into longer blocks in the background.
Compaction will create larger blocks containing data spanning up to 10% of the retention time,
or 31 days, whichever is smaller.
## Operational aspects
### Operational aspects
Prometheus has several flags that configure local storage. The most important are:
@ -134,16 +134,16 @@ will be used.
Expired block cleanup happens in the background. It may take up to two hours
to remove expired blocks. Blocks must be fully expired before they are removed.
## Right-Sizing Retention Size
### Right-Sizing Retention Size
If you are utilizing `storage.tsdb.retention.size` to set a size limit, you
will want to consider the right size for this value relative to the storage you
have allocated for Prometheus. It is wise to reduce the retention size to provide
a buffer, ensuring that older entries will be removed before the allocated storage
If you are utilizing `storage.tsdb.retention.size` to set a size limit, you
will want to consider the right size for this value relative to the storage you
have allocated for Prometheus. It is wise to reduce the retention size to provide
a buffer, ensuring that older entries will be removed before the allocated storage
for Prometheus becomes full.
At present, we recommend setting the retention size to, at most, 80-85% of your
allocated Prometheus disk space. This increases the likelihood that older entries
At present, we recommend setting the retention size to, at most, 80-85% of your
allocated Prometheus disk space. This increases the likelihood that older entries
will be removed prior to hitting any disk limitations.
## Remote storage integrations