* fix(parser): wrong end position aggregate expression
Fixes: https://github.com/prometheus/prometheus/issues/16053
The position range of nested aggregate expression was wrong, for the
expression "(sum(foo))" the position of "sum(foo)" should be 1-9, but
the parser could not decide the end of the expression on pos 9, instead
it read ahead to pos 10 and then emitted the aggregate. But we only
kept the last closing position (10) and wrote that into the aggregate.
The reason for this is that the parser cannot know from "(sum(foo)" alone
if the aggregate is finished. It could be finished as in "(sum(foo))" but
equally it could continue with group modifier as "(sum(foo) by (bar))".
Previous fix in #16041 tried to keep track of parenthesis, but that is
complicated because the error happens after closing two parenthesis. That
fix introduced new bugs.
This fix now addresses the issue directly. Since we have to step outside
the parser state machine anyway, we can just add an algorithm to
detect and fix the issue. That's Lexer.findPrevRightParen().
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This means we only do it once, rather than on every step of a range
query. Also the code gets a bit shorter.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
In aggregations and function calls. We no longer wrap the literal values
or matrix selectors that would appear here.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Matrix selectors have a Timestamp which indicates they are time-invariant,
so we don't need to wrap and then unwrap them when we come to use them.
Fix up tests that check this level of detail.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Previously, BenchmarkRangeQuery would run each case with three data sizes
(1, 10, 100) and three range lengths (1, 100, 1000) for nine variations.
With 36 cases, running with `-count=6` to get dependable results, would
take 40-50 minutes in total.
This PR removes the middle option in both dimensions, thus shrinking to
four variations and about 20 minutes to run everything.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Currently, the promql functions take the interface slice []parser.Value as an argument,
which is being implemented by the conrete types Vector, Matrix etc. This PR replaces
the interface with the concrete types, resulting in improved performance.
The inspiration for this PR came from #16698 which does this for binops.
I extended the idea to all promql functions
Signed-off-by: darshanime <deathbullet@gmail.com>
* pass single Matrix
Signed-off-by: darshanime <deathbullet@gmail.com>
---------
Signed-off-by: darshanime <deathbullet@gmail.com>
Observed on go1.24.4 darwin/arm64, the compensation variable is not correctly calculated.
Note you may have to run the tests several times to see it, to get the right ordering of series.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
As `histogram_count` is playing tricks to improve performance, we
better make sure that the limitation of extrapolation below zero still
works as expected.
Signed-off-by: beorn7 <beorn@grafana.com>
This deals with the count field of native histograms in the same way
as with simple float counters. It then scale the whole histogram with
the same factor as it has scaled the count. This will still allow
individual buckets to get extrapolated below zero, but maybe that is
fine.
This implements approach (2) as described in
https://github.com/prometheus/prometheus/issues/15976#issuecomment-3032095158
Signed-off-by: beorn7 <beorn@grafana.com>
This shows how float counters cannot go below zero when extrapolationg
for rate/increase, and how histograms do not have that protection yet,
leading to an overestimation of the rate/increase.
This also demonstrates edge cases where the count extrapolation does
not need to be limited, but an individual bucket still goes below
zero.
Signed-off-by: beorn7 <beorn@grafana.com>
step() is a new keyword introduced to represent the query step width in duration expressions.
min(a,b) and max(a,b) return the min and max from two duration expressions.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
This commit brings back direct mean calculation (for `avg` and
`avg_over_time`) but isn't an outright revert of #16569. It keeps the
improved incremental mean calculation and features generally a bit
cleaner code than before.
Also, this commit...
- ...updates the lengthy comment explaining the whole situation and
trade-offs.
- ...divides the running sum and the Kahan compensation term
separately (in direct mean calculation) to avoid the (unlikely)
possibility that sum and Kahan compensation together ovorflow
float64.
- ...uncomments the tests that should now work again on darwin/arm64.
- ...uncomments the test that should now reliably yield the
(inaccurate) value 0 on all hardware platforms. Also, the test
description has been updated accordingly.
- ...adds avg_over_time tests for zero and one sample in the range.
Signed-off-by: beorn7 <beorn@grafana.com>
The test in question actually worked fine even before #16569. The
finding reported in the comment has turned out to be caused by
something else.
Signed-off-by: beorn7 <beorn@grafana.com>
* fix(promql): histogram_quantile NaN observed in native histogram
Fixes: #16578
See the issue for detailed explanation.
When a histogram had only NaN observations and no normal observations,
we returned 0 from the quantile, which is completely wrong. If there were
normal observations but we went over them, we returned the upper bound of
the existing buckets, however that contradicts expectations on
histogram_fraction. Now we return NaN if the quantile is calculated to be
over all normal observations, falling into NaNs (in a virtual +Inf bucket).
We also return info level annotations if we see any NaN observations.
The annotation calls out if we returned NaN or even if we took the
virtual +Inf bucket into account.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* fix(promql): histogram_fraction NaN observed in native histogram
Fixes: #16580
According to the specification we should not take NaN observations
into account when calculating the fraction. This commit fixes that
and adds an info level annotation to let the user know about this.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This commit fixes the evaluation of invalid expressions like
`sum(rate(`. Before that, it would trigger a panic in the PromQL engine
because it tried to access an index which is out of range.
The bug was probably introduced by 06d0b063ea.
Signed-off-by: Simon Pasquier <spasquie@redhat.com>