Add regression test for the bug where avg_over_time with a single
histogram sample would produce +Inf count/sum and NaN zero bucket
due to division by zero. The test verifies that both regular
exponential histograms and native histograms with custom buckets
(NHCB) correctly return the histogram unchanged when averaging
a single sample.
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
As for float samples, Kahan summation is used for the `sum` and `avg` aggregation and for the respective `_over_time` functions.
Kahan summation is not perfect. This commit also adds tests that even Kahan summation cannot reliably pass.
These tests are commented out.
Note that the behavior might be different on other hardware platforms. We have to keep an eye on test failing on other hardware platforms and adjust them accordingly.
Signed-off-by: Aleksandr Smirnov <5targazer@mail.ru>
This implements the TRIM_UPPER (</) and TRIM_LOWER (>/) operators
that allow removing observations below or above a threshold from
a histogram. The implementation zeros out buckets outside the desired
range. It also recalculates the sum, including only bucket counts within
the specified threshold range.
Fixes#14651.
Signed-off-by: sujal shah <sujalshah28092004@gmail.com>
This implements the TRIM_UPPER (</) and TRIM_LOWER (>/) operators
that allow removing observations below or above a threshold from
a histogram. The implementation zeros out buckets outside the desired
range. It also recalculates the sum, including only bucket counts within
the specified threshold range.
Fixes#14651.
Signed-off-by: sujal shah <sujalshah28092004@gmail.com>
The test at line 1283 for avg_over_time(nhcb_metric[13m]) incorrectly
expected counter_reset_hint:gauge in the result. However, the actual
avg_over_time implementation does not explicitly set the CounterResetHint
to GaugeType on its output histogram.
With the new counter reset hint comparison logic added to the promqltest
framework (which compares hints when explicitly specified in expected
results), this incorrect expectation was now being caught.
This fix removes the incorrect counter_reset_hint:gauge from the expected
result, allowing the test to correctly verify the avg_over_time behavior
without asserting a specific hint value that the function does not set.
The counter reset hint comparison logic works as designed: if the expected
histogram has UnknownCounterReset (the default when not specified), no
comparison is performed. Only when a hint is explicitly specified in the
test expectation will it be compared against the actual result.
Fixes the test failure introduced by the counter reset hint comparison
feature in promqltest.
Signed-off-by: Aviral Garg <aviralg2106@gmail.com>
Signed-off-by: aviralgarg05 <gargaviral99@gmail.com>
Fixes#17255.
The implementation happens mostly in the Add and Sub method, but the reconciliation works for all relevant operations. For example, you can now `rate` over a range wherein the custom bucket boundaries are changing.
Any custom bucket reconciliation is flagged with an info-level annotation.
---------
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
avg_over_time already correctly checked the counter reset hint fo all
histograms, but in sum_over_time, the 1st histogram was missed in the
loop. This commit exposes the bug in a test.
Signed-off-by: beorn7 <beorn@grafana.com>
Fixes#17308.
As explained adding the warn-annotation about conflicting counter
reset hints doesn't happen consistently. Furthermore, because of
incremental mean calculation being used so far (which includes
subtraction), avg calculation always created gauge histograms.
The fix is to make Sub behave like Add WRT counter reset handling, and
then set the result of a subtraction to gauge explicitly in actual
PromQL subtraction (rather than using Sub for something else, like
incremental mean calculation). Also, track the presence of a
CounterReset hint and a NotCounterReset hint separately for the
entirety of aggregated histograms and create the warn-annotation based
on that.
As a minor fix, this commit also consistently creates the warn
annotation in aggregation to be about "aggregation" rather than
"subtraction" or "addition", because the latter are just internal
operations within the aggregation, which is not of interest for the
user.
Signed-off-by: beorn7 <beorn@grafana.com>
The current code stops the walk after we have found the first relevant
function. However, in expressions with multiple legs, we will then use
the `HistogramStatsIterator` at most once. This change should make
sure we explore all legs.
The added tests make sure we are not using `HistogramStatsIterator`
where we shouldn't (but the opposite can only be seen in a benchmark
or with a more explicit test).
Signed-off-by: beorn7 <beorn@grafana.com>
Prior to #17127, we needed to add another level in the AST to trigger
the usage of `HistogramStatsIterator`. This is fixed now.
Signed-off-by: beorn7 <beorn@grafana.com>
This is an attempt to make sure that we are not accidentally warning
about conflicting counter resets in rate calculation, see
https://github.com/prometheus/prometheus/pull/17051#issuecomment-3226503416 .
This is done by being more explicit about the warn expectation.
However, as long as
https://github.com/prometheus/prometheus/issues/15346 is not
addressed, we won't be able to trigger the annotation this way anyway.
However, we can play a trick, by wrapping a suitable expression in
`histogram_count` or `histogram_sum`, which will invoke the
`HistogramStatsIterator`, which in turn creates counter reset hints on
the fly. So this commit also adds tests with that, both for absence of
an annotation with `rate` and presence of an annotation with
`sum_over_time`.
Signed-off-by: beorn7 <beorn@grafana.com>
test tbs
Signed-off-by: beorn7 <beorn@grafana.com>
As `histogram_count` is playing tricks to improve performance, we
better make sure that the limitation of extrapolation below zero still
works as expected.
Signed-off-by: beorn7 <beorn@grafana.com>
This deals with the count field of native histograms in the same way
as with simple float counters. It then scale the whole histogram with
the same factor as it has scaled the count. This will still allow
individual buckets to get extrapolated below zero, but maybe that is
fine.
This implements approach (2) as described in
https://github.com/prometheus/prometheus/issues/15976#issuecomment-3032095158
Signed-off-by: beorn7 <beorn@grafana.com>
This shows how float counters cannot go below zero when extrapolationg
for rate/increase, and how histograms do not have that protection yet,
leading to an overestimation of the rate/increase.
This also demonstrates edge cases where the count extrapolation does
not need to be limited, but an individual bucket still goes below
zero.
Signed-off-by: beorn7 <beorn@grafana.com>
* fix(promql): histogram_quantile NaN observed in native histogram
Fixes: #16578
See the issue for detailed explanation.
When a histogram had only NaN observations and no normal observations,
we returned 0 from the quantile, which is completely wrong. If there were
normal observations but we went over them, we returned the upper bound of
the existing buckets, however that contradicts expectations on
histogram_fraction. Now we return NaN if the quantile is calculated to be
over all normal observations, falling into NaNs (in a virtual +Inf bucket).
We also return info level annotations if we see any NaN observations.
The annotation calls out if we returned NaN or even if we took the
virtual +Inf bucket into account.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* fix(promql): histogram_fraction NaN observed in native histogram
Fixes: #16580
According to the specification we should not take NaN observations
into account when calculating the fraction. This commit fixes that
and adds an info level annotation to let the user know about this.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
If a rate (or increase) is calculated on native histograms, and there
is a counter reset between the 1st and 2nd histogram, we never have to
touch the 1st histogram, so it doesn't even matter if it has an
incompatible bucket layout. So we should not error out in that case.
This simply nulls out the 1st histogram in that case.
Signed-off-by: beorn7 <beorn@grafana.com>
promql: Add additional incompatible nhcb schemas tests for functions and comparison operators
* Add agg_over_time tests for nhcb with incompatible schemas
* Add more function and comparison operator tests
---------
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>