Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
The current code stops the walk after we have found the first relevant
function. However, in expressions with multiple legs, we will then use
the `HistogramStatsIterator` at most once. This change should make
sure we explore all legs.
The added tests make sure we are not using `HistogramStatsIterator`
where we shouldn't (but the opposite can only be seen in a benchmark
or with a more explicit test).
Signed-off-by: beorn7 <beorn@grafana.com>
Prior to #17127, we needed to add another level in the AST to trigger
the usage of `HistogramStatsIterator`. This is fixed now.
Signed-off-by: beorn7 <beorn@grafana.com>
Previously, multiple calls returned a wrong counter reset hint.
This commit also includes a bunch of refactorings that partially have
value on their own. However, the need for them was triggered by the
additional work needed for idempotency, so I included them in this
commit.
Signed-off-by: beorn7 <beorn@grafana.com>
Hide adding NHCB parser on top another parser in New() function
so we can easily add direct NHCB capable parsers.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
After an effective Seek, the lastFH isn't the lastFH anymore, so we
should nil it out.
In practice, this should only matter is sub-queries, because we are
otherwise not interested in a counter reset of the first sample
returned after a Seek.
Sub-queries, on the other hand, always do their own counter reset
detection. (For that, they would prefer to see the whole histogram, so
that's another problem for another commit.)
Signed-off-by: beorn7 <beorn@grafana.com>
The `HistogramStatsIterator` is only meant to be used within PromQL.
PromQL only ever uses float histograms. If `HistogramStatsIterator` is
capable of handling integer histograms, it will still be used, for
example by the `BufferedSeriesIterator`, which buffers samples and
will use an integer `Histogram` for it, if the underlying chunk is an
integer histogram chunk (which is common).
However, we can simply intercept the `Next` and `Seek` calls and
pretend to only ever be able te return float histograms. This has the
welcome side effect that we do not have to handle a mix of float and
integer histograms in the `HistogramStatsIterator` anymore.
With this commit, the `AtHistogram` call has been changed to panic so
that we ensure it is never called.
Benchmark differences between this and the previous commit:
name old time/op new time/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 837ms ± 3% 616ms ± 2% -26.36% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 1.11s ± 1% 0.91s ± 3% -17.75% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 751ms ± 6% 581ms ± 1% -22.63% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 1.13s ±11% 0.85s ± 2% -24.59% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 531MB ± 0% 148MB ± 0% -72.08% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 528MB ± 0% 145MB ± 0% -72.60% (p=0.016 n=5+4)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 452MB ± 0% 145MB ± 0% -67.97% (p=0.016 n=5+4)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 452MB ± 0% 141MB ± 0% -68.70% (p=0.016 n=5+4)
name old allocs/op new allocs/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 8.95M ± 0% 1.60M ± 0% -82.15% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 8.84M ± 0% 1.49M ± 0% -83.16% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 5.96M ± 0% 1.57M ± 0% -73.68% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 5.86M ± 0% 1.46M ± 0% -75.05% (p=0.016 n=5+4)
Signed-off-by: beorn7 <beorn@grafana.com>
PR #16702 introduced a regression because it was too strict in
detecting the condition for using the `HistogramStatsIterator`. It
essentially required the triggering function to be buried at least one
level deep.
`histogram_count(sum(rate(native_histogram_series[2m]))` would not
trigger anymore, but
`1*histogram_count(sum(rate(native_histogram_series[2m]))` would.
Ironically, PR #16682 made the performance of the
`HistogramStatsIterator` so much worse that _not_ using it was often
better, but this has to be addressed in a separate commit.
This commit reinstates the previous `HistogramStatsIterator` detection
behavior, as PR #16702 intended to keep it.
Relevant benchmark changes with this commit (i.e. old is without using
`HistogramStatsIterator`, new is with `HistogramStatsIterator`):
name old time/op new time/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 802ms ± 3% 837ms ± 3% +4.42% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 1.22s ± 3% 1.11s ± 1% -9.46% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 611ms ± 5% 751ms ± 6% +22.87% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 975ms ± 4% 1131ms ±11% +16.04% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 222MB ± 0% 531MB ± 0% +139.63% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 323MB ± 0% 528MB ± 0% +63.81% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 179MB ± 0% 452MB ± 0% +153.07% (p=0.016 n=4+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 175MB ± 0% 452MB ± 0% +157.73% (p=0.016 n=4+5)
name old allocs/op new allocs/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 4.48M ± 0% 8.95M ± 0% +99.51% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 5.02M ± 0% 8.84M ± 0% +75.89% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 3.00M ± 0% 5.96M ± 0% +98.93% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 2.89M ± 0% 5.86M ± 0% +102.69% (p=0.016 n=4+5)
Signed-off-by: beorn7 <beorn@grafana.com>
- Add a code comment about a counter reset edge case (which is
hopefully not relevant in practice).
- Rename the receiver from `f` to `hsi`. (`f` seemed like completely
off as a name. `i` or `it` might have worked, too, but I ended up
with `hsi` as the easiest for the reader.)
Signed-off-by: beorn7 <beorn@grafana.com>
So far, we emitted a `HistogramCounterResetCollisionWarning` when
encountering conflicting counter resets in the calculation of (i)rate
and friends. We even tested for that. However, in the rate
calculation, we are not interested in those collisions. They are
actually expected.
On the other hand, we did not warn about those collisions when doing a
`sum` aggregation, where such a warning would be appropriate.
This commit removes the warning in the former case and adds it in the
latter. Sadly, we cannot really test this as we still remove the
counter reset hint for the first sample in a chunk. (And that's the
only sample where we could get a `NotCounterReset` hint.)
Signed-off-by: beorn7 <beorn@grafana.com>
This is an attempt to make sure that we are not accidentally warning
about conflicting counter resets in rate calculation, see
https://github.com/prometheus/prometheus/pull/17051#issuecomment-3226503416 .
This is done by being more explicit about the warn expectation.
However, as long as
https://github.com/prometheus/prometheus/issues/15346 is not
addressed, we won't be able to trigger the annotation this way anyway.
However, we can play a trick, by wrapping a suitable expression in
`histogram_count` or `histogram_sum`, which will invoke the
`HistogramStatsIterator`, which in turn creates counter reset hints on
the fly. So this commit also adds tests with that, both for absence of
an annotation with `rate` and presence of an annotation with
`sum_over_time`.
Signed-off-by: beorn7 <beorn@grafana.com>
test tbs
Signed-off-by: beorn7 <beorn@grafana.com>
Add further tests for first_over_time (also covering existing
last_over_time, count_over_time, etc) to exercise vectors
containing a mix of float and histogram samples where the
histogram samples do not come last in the series.
This tripped over https://github.com/prometheus/prometheus/issues/17025
so it's structured a bit oddly to work around that bug in the
appender as used by promtest.
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
Add a first_over_time function, and corresponding ts_of_first_over_time
function. Both are behind the experimental functions feature flag.
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
Right now TestParseExpressions tests if a query returns an error but it only does a fuzzy check on returned errors.
The error returned by the parser is ParseErrors, which is a slice of ParseErr structs.
The Error() method on ParseErrors will return an error string based on the first error in that slice. This hides other returned errors so we can end up with bogus errors being returned but won't ever find this via this test.
This change makes the test compare returned error (which is always ParseErrors type) with expected ParseErrors slice.
The extra benefit of this is that current tests mostly ignore error positional range and only test for correct error message. Now errors must return expected positional information.
There are a few cases uncovered where the positional informatio of errors seems wrong, added FIXME for these lines.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* fix(parser): wrong end position aggregate expression
Fixes: https://github.com/prometheus/prometheus/issues/16053
The position range of nested aggregate expression was wrong, for the
expression "(sum(foo))" the position of "sum(foo)" should be 1-9, but
the parser could not decide the end of the expression on pos 9, instead
it read ahead to pos 10 and then emitted the aggregate. But we only
kept the last closing position (10) and wrote that into the aggregate.
The reason for this is that the parser cannot know from "(sum(foo)" alone
if the aggregate is finished. It could be finished as in "(sum(foo))" but
equally it could continue with group modifier as "(sum(foo) by (bar))".
Previous fix in #16041 tried to keep track of parenthesis, but that is
complicated because the error happens after closing two parenthesis. That
fix introduced new bugs.
This fix now addresses the issue directly. Since we have to step outside
the parser state machine anyway, we can just add an algorithm to
detect and fix the issue. That's Lexer.findPrevRightParen().
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This means we only do it once, rather than on every step of a range
query. Also the code gets a bit shorter.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
In aggregations and function calls. We no longer wrap the literal values
or matrix selectors that would appear here.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Matrix selectors have a Timestamp which indicates they are time-invariant,
so we don't need to wrap and then unwrap them when we come to use them.
Fix up tests that check this level of detail.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Previously, BenchmarkRangeQuery would run each case with three data sizes
(1, 10, 100) and three range lengths (1, 100, 1000) for nine variations.
With 36 cases, running with `-count=6` to get dependable results, would
take 40-50 minutes in total.
This PR removes the middle option in both dimensions, thus shrinking to
four variations and about 20 minutes to run everything.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>