Previously, multiple calls returned a wrong counter reset hint.
This commit also includes a bunch of refactorings that partially have
value on their own. However, the need for them was triggered by the
additional work needed for idempotency, so I included them in this
commit.
Signed-off-by: beorn7 <beorn@grafana.com>
After an effective Seek, the lastFH isn't the lastFH anymore, so we
should nil it out.
In practice, this should only matter is sub-queries, because we are
otherwise not interested in a counter reset of the first sample
returned after a Seek.
Sub-queries, on the other hand, always do their own counter reset
detection. (For that, they would prefer to see the whole histogram, so
that's another problem for another commit.)
Signed-off-by: beorn7 <beorn@grafana.com>
The `HistogramStatsIterator` is only meant to be used within PromQL.
PromQL only ever uses float histograms. If `HistogramStatsIterator` is
capable of handling integer histograms, it will still be used, for
example by the `BufferedSeriesIterator`, which buffers samples and
will use an integer `Histogram` for it, if the underlying chunk is an
integer histogram chunk (which is common).
However, we can simply intercept the `Next` and `Seek` calls and
pretend to only ever be able te return float histograms. This has the
welcome side effect that we do not have to handle a mix of float and
integer histograms in the `HistogramStatsIterator` anymore.
With this commit, the `AtHistogram` call has been changed to panic so
that we ensure it is never called.
Benchmark differences between this and the previous commit:
name old time/op new time/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 837ms ± 3% 616ms ± 2% -26.36% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 1.11s ± 1% 0.91s ± 3% -17.75% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 751ms ± 6% 581ms ± 1% -22.63% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 1.13s ±11% 0.85s ± 2% -24.59% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 531MB ± 0% 148MB ± 0% -72.08% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 528MB ± 0% 145MB ± 0% -72.60% (p=0.016 n=5+4)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 452MB ± 0% 145MB ± 0% -67.97% (p=0.016 n=5+4)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 452MB ± 0% 141MB ± 0% -68.70% (p=0.016 n=5+4)
name old allocs/op new allocs/op delta
NativeHistograms/histogram_count_with_short_rate_interval-16 8.95M ± 0% 1.60M ± 0% -82.15% (p=0.008 n=5+5)
NativeHistograms/histogram_count_with_long_rate_interval-16 8.84M ± 0% 1.49M ± 0% -83.16% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_short_rate_interval-16 5.96M ± 0% 1.57M ± 0% -73.68% (p=0.008 n=5+5)
NativeHistogramsCustomBuckets/histogram_count_with_long_rate_interval-16 5.86M ± 0% 1.46M ± 0% -75.05% (p=0.016 n=5+4)
Signed-off-by: beorn7 <beorn@grafana.com>
- Add a code comment about a counter reset edge case (which is
hopefully not relevant in practice).
- Rename the receiver from `f` to `hsi`. (`f` seemed like completely
off as a name. `i` or `it` might have worked, too, but I ended up
with `hsi` as the easiest for the reader.)
Signed-off-by: beorn7 <beorn@grafana.com>
The problem is in the counter reset detection. The code that loads the
samples is matrixIterSlice which uses the typed Buffer iterator, which
will preload the integer histogram samples, however the last sample is
always(!) loaded as a float histogram sample in matrixIterSlice and the
optimized iterator fails to detect counter resets in that case.
Also the iterator does not reset lastH, lastFH properly.
Ref: https://github.com/prometheus/prometheus/issues/16681
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The histogram stats iterator does not fully clear the histogram object
and is not resilient to new fields being added to the histogram type.
To resolve the issue, the commit uses the CopyTo methods which should
be future proof to new fields being added.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
The histogram stats decoder keeps track of the last seen histogram sample
in order to properly detect counter resets. We are seeing an issue where
a histogram with UnknownResetHint gets treated as a counter reset when it follows
a stale histogram sample.
I believe that this is incorrect since stale samples should be completely ignored
in PromQL. As a result, they should not be stored in the histogram stats iterator
and the counter reset detection needs to be done against the last non-stale sample.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Implement histogram statistics decoder
This commit speeds up histogram_count and histogram_sum
functions on native histograms. The idea is to have separate decoders which can be
used by the engine to only read count/sum values from histogram objects. This should help
with reducing allocations when decoding histograms, as well as with speeding up aggregations
like sum since they will be done on floats and not on histogram objects.
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
---------
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>