Fix incorrect interpolation when counter resets occur in smoothed range
selector evaluation. Previously, the asymmetric handling of counter
resets (y1=0 on left edge, y2+=y1 on right edge) produced wrong values.
Now uniformly set y1=0 when a counter reset is detected, correctly
modeling the counter as starting from 0 post-reset.
This fixes rate calculations across counter resets. For example,
rate(metric[10s] smoothed) where metric goes from 100 to 10 (a reset)
now correctly computes 0.666... by treating the counter as resetting
to 0 rather than producing inflated values from the old behavior.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Add fast path that returns early when no duplicate labelsets exist,
avoiding allocations in the common case. For the merge case, simplify
collision detection by checking for duplicate timestamps after sorting
instead of building a timestamp map, reducing memory overhead.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Generally, binary operations between two vectors fail if there is a many-to-one
or one-to-many matching situation between series within a match group and no
`group_left()` or `group_right()` modifier is present. For filter ops this is
also generally the case, but there can be situations where multiple series on
one side can match a single series on the other side, but only 0 or 1 of those
multiple series remains after the filter operator has been applied. In this
case, the PromQL engine does not produce a matching error, since it only tracks
series matching for those series that survive the filtering. IMO this is
incorrect behavior (which can also erratically make a query sometimes fail and
sometimes succeed, depending on current sample values), and we should always
produce an error if there is a match error prior to applying the filter op.
This PR ensures that we do the cardinality / match tracking independently of
the result of the filter operation.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
This adds the following native histograms (with a few classic buckets for backwards compatibility), while keeping the corresponding summaries (same name, just without `_histogram`):
- `prometheus_sd_refresh_duration_histogram_seconds`
- `prometheus_rule_evaluation_duration_histogram_seconds`
- `prometheus_rule_group_duration_histogram_seconds`
- `prometheus_target_sync_length_histogram_seconds`
- `prometheus_target_interval_length_histogram_seconds`
- `prometheus_engine_query_duration_histogram_seconds`
Signed-off-by: Harsh <harshmastic@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Methods added:
- `SampleOffset(metric *labels.Labels) float64` to calculate the sample offset for a given label set.
- `AddRatioSampleWithOffset(ratioLimit, sampleOffset float64) bool` to find out whether a given sample offset falls within a given ratio limit.
The already existing method `AddRatioSample(ratioLimit float64, sample *Sample) bool` is now implemented as a simple combination of the two other methods. Exposing these methods helps downstream projects to re-use the implementations including easier testing.
Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
In this PR, we are eliminating expensive string-keyed (by signature) maps that are accessed for every sample processed. During preprocessing in rangeEval, we assign a unique number from 0 to n-1 to each of the n string signature values, and later only use this number as a label set signature.
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Fixes#17255.
The implementation happens mostly in the Add and Sub method, but the reconciliation works for all relevant operations. For example, you can now `rate` over a range wherein the custom bucket boundaries are changing.
Any custom bucket reconciliation is flagged with an info-level annotation.
---------
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Fixes#17308.
As explained adding the warn-annotation about conflicting counter
reset hints doesn't happen consistently. Furthermore, because of
incremental mean calculation being used so far (which includes
subtraction), avg calculation always created gauge histograms.
The fix is to make Sub behave like Add WRT counter reset handling, and
then set the result of a subtraction to gauge explicitly in actual
PromQL subtraction (rather than using Sub for something else, like
incremental mean calculation). Also, track the presence of a
CounterReset hint and a NotCounterReset hint separately for the
entirety of aggregated histograms and create the warn-annotation based
on that.
As a minor fix, this commit also consistently creates the warn
annotation in aggregation to be about "aggregation" rather than
"subtraction" or "addition", because the latter are just internal
operations within the aggregation, which is not of interest for the
user.
Signed-off-by: beorn7 <beorn@grafana.com>
* Add anchored and smoothed to vector selectors.
This adds "anchored" and "smoothed" keywords that can be used following a matrix selector.
"Anchored" selects the last point before the range (or the first one after the range) and adds it at the boundary of the matrix selector.
"Smoothed" applies linear interpolation at the edges using the points around the edges. In the absence of a point before or after the edge, the first or the last point is added to the edge, without interpolation.
*Exemple usage*
* `increase(caddy_http_requests_total[5m] anchored)` (equivalent of *caddy_http_requests_total - caddy_http_requests_total offset 5m* but takes counter reset into consideration)
* `rate(caddy_http_requests_total[step()] smoothed)`
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Update docs/feature_flags.md
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
* Smoothed/Anchored rate: Add more tests
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Anchored/Smoothed modifier: error out with histograms
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
---------
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
The current code stops the walk after we have found the first relevant
function. However, in expressions with multiple legs, we will then use
the `HistogramStatsIterator` at most once. This change should make
sure we explore all legs.
The added tests make sure we are not using `HistogramStatsIterator`
where we shouldn't (but the opposite can only be seen in a benchmark
or with a more explicit test).
Signed-off-by: beorn7 <beorn@grafana.com>
PR #16702 introduced a regression because it was too strict in
detecting the condition for using the `HistogramStatsIterator`. It
essentially required the triggering function to be buried at least one
level deep.
`histogram_count(sum(rate(native_histogram_series[2m]))` would not
trigger anymore, but
`1*histogram_count(sum(rate(native_histogram_series[2m]))` would.
Ironically, PR #16682 made the performance of the
`HistogramStatsIterator` so much worse that _not_ using it was often
better, but this has to be addressed in a separate commit.
This commit reinstates the previous `HistogramStatsIterator` detection
behavior, as PR #16702 intended to keep it.
Relevant benchmark changes with this commit (i.e. old is without using
`HistogramStatsIterator`, new is with `HistogramStatsIterator`):
name old time/op new time/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 802ms ± 3% 837ms ± 3% +4.42% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 1.22s ± 3% 1.11s ± 1% -9.46% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 611ms ± 5% 751ms ± 6% +22.87% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 975ms ± 4% 1131ms ±11% +16.04% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 222MB ± 0% 531MB ± 0% +139.63% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 323MB ± 0% 528MB ± 0% +63.81% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 179MB ± 0% 452MB ± 0% +153.07% (p=0.016 n=4+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 175MB ± 0% 452MB ± 0% +157.73% (p=0.016 n=4+5)
name old allocs/op new allocs/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 4.48M ± 0% 8.95M ± 0% +99.51% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 5.02M ± 0% 8.84M ± 0% +75.89% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 3.00M ± 0% 5.96M ± 0% +98.93% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 2.89M ± 0% 5.86M ± 0% +102.69% (p=0.016 n=4+5)
Signed-off-by: beorn7 <beorn@grafana.com>
So far, we emitted a `HistogramCounterResetCollisionWarning` when
encountering conflicting counter resets in the calculation of (i)rate
and friends. We even tested for that. However, in the rate
calculation, we are not interested in those collisions. They are
actually expected.
On the other hand, we did not warn about those collisions when doing a
`sum` aggregation, where such a warning would be appropriate.
This commit removes the warning in the former case and adds it in the
latter. Sadly, we cannot really test this as we still remove the
counter reset hint for the first sample in a chunk. (And that's the
only sample where we could get a `NotCounterReset` hint.)
Signed-off-by: beorn7 <beorn@grafana.com>
Add a first_over_time function, and corresponding ts_of_first_over_time
function. Both are behind the experimental functions feature flag.
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
This means we only do it once, rather than on every step of a range
query. Also the code gets a bit shorter.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
In aggregations and function calls. We no longer wrap the literal values
or matrix selectors that would appear here.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Matrix selectors have a Timestamp which indicates they are time-invariant,
so we don't need to wrap and then unwrap them when we come to use them.
Fix up tests that check this level of detail.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Currently, the promql functions take the interface slice []parser.Value as an argument,
which is being implemented by the conrete types Vector, Matrix etc. This PR replaces
the interface with the concrete types, resulting in improved performance.
The inspiration for this PR came from #16698 which does this for binops.
I extended the idea to all promql functions
Signed-off-by: darshanime <deathbullet@gmail.com>
* pass single Matrix
Signed-off-by: darshanime <deathbullet@gmail.com>
---------
Signed-off-by: darshanime <deathbullet@gmail.com>
step() is a new keyword introduced to represent the query step width in duration expressions.
min(a,b) and max(a,b) return the min and max from two duration expressions.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
This commit brings back direct mean calculation (for `avg` and
`avg_over_time`) but isn't an outright revert of #16569. It keeps the
improved incremental mean calculation and features generally a bit
cleaner code than before.
Also, this commit...
- ...updates the lengthy comment explaining the whole situation and
trade-offs.
- ...divides the running sum and the Kahan compensation term
separately (in direct mean calculation) to avoid the (unlikely)
possibility that sum and Kahan compensation together ovorflow
float64.
- ...uncomments the tests that should now work again on darwin/arm64.
- ...uncomments the test that should now reliably yield the
(inaccurate) value 0 on all hardware platforms. Also, the test
description has been updated accordingly.
- ...adds avg_over_time tests for zero and one sample in the range.
Signed-off-by: beorn7 <beorn@grafana.com>