docs: clarify docs for PromQL aggregation operators (#16837)

Signed-off-by: Charles Korn <charles.korn@grafana.com>
This commit is contained in:
Charles Korn 2025-07-10 23:34:57 +10:00 committed by GitHub
parent 362141370d
commit 8397b738bf
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -284,21 +284,21 @@ Prometheus supports the following built-in aggregation operators that can be
used to aggregate the elements of a single instant vector, resulting in a new
vector of fewer elements with aggregated values:
* `sum` (calculate sum over dimensions)
* `avg` (calculate the arithmetic average over dimensions)
* `min` (select minimum over dimensions)
* `max` (select maximum over dimensions)
* `bottomk` (smallest _k_ elements by sample value)
* `topk` (largest _k_ elements by sample value)
* `limitk` (sample _k_ elements, **experimental**, must be enabled with `--enable-feature=promql-experimental-functions`)
* `limit_ratio` (sample a pseudo-random ratio _r_ of elements, **experimental**, must be enabled with `--enable-feature=promql-experimental-functions`)
* `group` (all values in the resulting vector are 1)
* `count` (count number of elements in the vector)
* `count_values` (count number of elements with the same value)
* `sum(v)` (calculate sum over dimensions)
* `avg(v)` (calculate the arithmetic average over dimensions)
* `min(v)` (select minimum over dimensions)
* `max(v)` (select maximum over dimensions)
* `bottomk(k, v)` (smallest `k` elements by sample value)
* `topk(k, v)` (largest `k` elements by sample value)
* `limitk(k, v)` (sample `k` elements, **experimental**, must be enabled with `--enable-feature=promql-experimental-functions`)
* `limit_ratio(r, v)` (sample a pseudo-random ratio `r` of elements, **experimental**, must be enabled with `--enable-feature=promql-experimental-functions`)
* `group(v)` (all values in the resulting vector are 1)
* `count(v)` (count number of elements in the vector)
* `count_values(l, v)` (count number of elements with the same value)
* `stddev` (calculate population standard deviation over dimensions)
* `stdvar` (calculate population standard variance over dimensions)
* `quantile` (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
* `stddev(v)` (calculate population standard deviation over dimensions)
* `stdvar(v)` (calculate population standard variance over dimensions)
* `quantile(φ, v)` (calculate φ-quantile (0 ≤ φ ≤ 1) over dimensions)
These operators can either be used to aggregate over **all** label dimensions
or preserve distinct dimensions by including a `without` or `by` clause. These
@ -318,29 +318,62 @@ all other labels are preserved in the output. `by` does the opposite and drops
labels that are not listed in the `by` clause, even if their label values are
identical between all elements of the vector.
`parameter` is only required for `topk`, `bottomk`, `limitk`, `limit_ratio`,
`quantile`, and `count_values`. It is used as the value for _k_, _r_, φ, or the
name of the additional label, respectively.
### Detailed explanations
`sum` sums up sample values in the same way as the `+` binary operator does
between two values. Similarly, `avg` divides the sum by the number of
aggregated samples in the same way as the `/` binary operator. Therefore, all
sample values aggregation into a single resulting vector element must either be
#### `sum`
`sum(v)` sums up sample values in `v` in the same way as the `+` binary operator does
between two values.
All sample values being aggregated into a single resulting vector element must either be
float samples or histogram samples. An aggregation of a mix of both is invalid,
resulting in the removeal of the corresponding vector element from the output
resulting in the removal of the corresponding vector element from the output
vector, flagged by a warn-level annotation.
`min` and `max` only operate on float samples, following IEEE 754 floating
##### Examples
If the metric `memory_consumption_bytes` had time series that fan out by
`application`, `instance`, and `group` labels, we could calculate the total
memory consumption per application and group over all instances via:
sum without (instance) (memory_consumption_bytes)
Which is equivalent to:
sum by (application, group) (memory_consumption_bytes)
If we are just interested in the total memory consumption in **all**
applications, we could simply write:
sum(memory_consumption_bytes)
#### `avg`
`avg(v)` divides the sum of `v` by the number of aggregated samples in the same way
as the `/` binary operator.
All sample values being aggregated into a single resulting vector element must either be
float samples or histogram samples. An aggregation of a mix of both is invalid,
resulting in the removal of the corresponding vector element from the output
vector, flagged by a warn-level annotation.
#### `min` and `max`
`min(v)` and `max(v)` return the minimum or maximum value, respectively, in `v`.
They only operate on float samples, following IEEE 754 floating
point arithmetic, which in particular implies that `NaN` is only ever
considered a minimum or maximum if all aggregated values are `NaN`. Histogram
samples in the input vector are ignored, flagged by an info-level annotation.
`topk` and `bottomk` are different from other aggregators in that a subset of
the input samples, including the original labels, are returned in the result
vector. `by` and `without` are only used to bucket the input vector. Similar to
`min` and `max`, they only operate on float samples, considering `NaN` values
#### `topk` and `bottomk`
`topk(k, v)` and `bottomk(k, v)` are different from other aggregators in that a subset of
`k` values from the input samples, including the original labels, are returned in the result vector.
`by` and `without` are only used to bucket the input vector.
Similar to `min` and `max`, they only operate on float samples, considering `NaN` values
to be farthest from the top or bottom, respectively. Histogram samples in the
input vector are ignored, flagged by an info-level annotation.
@ -348,72 +381,108 @@ If used in an instant query, `topk` and `bottomk` return series ordered by
value in descending or ascending order, respectively. If used with `by` or
`without`, then series within each bucket are sorted by value, and series in
the same bucket are returned consecutively, but there is no guarantee that
buckets of series will be returned in any particular order. No sorting applies
to range queries.
buckets of series will be returned in any particular order.
`limitk` and `limit_ratio` also return a subset of the input samples, including
the original labels in the result vector. The subset is selected in a
deterministic pseudo-random way. `limitk` picks _k_ samples, while
`limit_ratio` picks a ratio _r_ of samples (each determined by `parameter`).
This happens independent of the sample type. Therefore, it works for both float
samples and histogram samples. _r_ can be between +1 and -1. The absolute value
of _r_ is used as the selection ratio, but the selection order is inverted for
a negative _r_, which can be used to select complements. For example,
`limit_ratio(0.1, ...)` returns a deterministic set of approximatiely 10% of
No sorting applies to range queries.
##### Example
To get the 5 instances with the highest memory consumption across all instances we could write:
topk(5, memory_consumption_bytes)
#### `limitk` and `limit_ratio`
`limitk(k, v)` returns a subset of `k` input samples, including
the original labels in the result vector.
The subset is selected in a deterministic pseudo-random way.
This happens independent of the sample type.
Therefore, it works for both float samples and histogram samples.
##### Example
To sample 10 timeseries we could write:
limitk(10, memory_consumption_bytes)
#### `limit_ratio`
`limit_ratio(r, v)` returns a subset of the input samples, including
the original labels in the result vector.
The subset is selected in a deterministic pseudo-random way.
This happens independent of the sample type.
Therefore, it works for both float samples and histogram samples.
`r` can be between +1 and -1. The absolute value of `r` is used as the selection ratio,
but the selection order is inverted for a negative `r`, which can be used to select complements.
For example, `limit_ratio(0.1, ...)` returns a deterministic set of approximatiely 10% of
the input samples, while `limit_ratio(-0.9, ...)` returns precisely the
remaining approximately 90% of the input samples not returned by
`limit_ratio(0.1, ...)`.
remaining approximately 90% of the input samples not returned by `limit_ratio(0.1, ...)`.
`group` and `count` do not interact with the sample values,
they work in the same way for float samples and histogram samples.
#### `group`
`group(v)` returns 1 for each group that contains any value at that timestamp.
The value may be a float or histogram sample.
#### `count`
`count(v)` returns the number of values at that timestamp, or no value at all
if no values are present at that timestamp.
The value may be a float or histogram sample.
#### `count_values`
`count_values(l, v)` outputs one time series per unique sample value in `v`.
Each series has an additional label, given by `l`, and the label value is the
unique sample value. The value of each time series is the number of times that sample value was present.
`count_values` outputs one time series per unique sample value. Each series has
an additional label. The name of that label is given by the aggregation
parameter, and the label value is the unique sample value. The value of each
time series is the number of times that sample value was present.
`count_values` works with both float samples and histogram samples. For the
latter, a compact string representation of the histogram sample value is used
as the label value.
`stddev` and `stdvar` only work with float samples, following IEEE 754 floating
point arithmetic. Histogram samples in the input vector are ignored, flagged by
an info-level annotation.
`quantile` calculates the φ-quantile, the value that ranks at number φ*N among
the N metric values of the dimensions aggregated over. φ is provided as the
aggregation parameter. For example, `quantile(0.5, ...)` calculates the median,
`quantile(0.95, ...)` the 95th percentile. For φ = `NaN`, `NaN` is returned.
For φ < 0, `-Inf` is returned. For φ > 1, `+Inf` is returned.
### Examples
If the metric `http_requests_total` had time series that fan out by
`application`, `instance`, and `group` labels, we could calculate the total
number of seen HTTP requests per application and group over all instances via:
sum without (instance) (http_requests_total)
Which is equivalent to:
sum by (application, group) (http_requests_total)
If we are just interested in the total of HTTP requests we have seen in **all**
applications, we could simply write:
sum(http_requests_total)
##### Example
To count the number of binaries running each build version we could write:
count_values("version", build_version)
To get the 5 largest HTTP requests counts across all instances we could write:
#### `stddev`
topk(5, http_requests_total)
`stddev(v)` returns the standard deviation of `v`.
To sample 10 timeseries, for example to inspect labels and their values, we
could write:
`stddev` only works with float samples, following IEEE 754 floating
point arithmetic. Histogram samples in the input vector are ignored, flagged by
an info-level annotation.
limitk(10, http_requests_total)
#### `stdvar`
`stdvar(v)` returns the standard deviation of `v`.
`stdvar` only works with float samples, following IEEE 754 floating
point arithmetic. Histogram samples in the input vector are ignored, flagged by
an info-level annotation.
#### `quantile`
`quantile(φ, v)` calculates the φ-quantile, the value that ranks at number φ*N among
the N metric values of the dimensions aggregated over.
`quantile` only works with float samples. Histogram samples in the input vector
are ignored, flagged by an info-level annotation.
`NaN` is considered the smallest possible value.
For example, `quantile(0.5, ...)` calculates the median, `quantile(0.95, ...)` the 95th percentile.
Special cases:
* For φ = `NaN`, `NaN` is returned.
* For φ < 0, `-Inf` is returned.
* For φ > 1, `+Inf` is returned.
## Binary operator precedence