139 Commits

Author SHA1 Message Date
Björn Rabenstein
b8d19543b8
Add histogram validation in remote-read and during reducing resolution (#17561)
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.

Furthermore, invalid negative offsets might cause problems, too.

Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)

In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-21 00:22:24 +01:00
beorn7
2dfc324821 model/histogram: Make histogram bucket iterators more robust
Currently, iterating over histogram buckets can panic if the spans are
not consistent with the buckets. We aim for validating histograms upon
ingestion, but there might still be data corruptions on disk that
could trigger the panic. While data corruption on disk is really bad
and will lead to all kind of weirdness, we should still avoid
panic'ing.

Note, though, that chunks are secured by checksums, so the corruptions
won't realistically happen because of disk faults, but more likely
because a chunk was generated in a faulty way in the first place, by
a software bug or even maliciously.

This commit prevents panics in the situation where there are fewer
buckets than described by the spans. Note that the missing buckets
will simply not be iterated over. There is no signalling of this
problem. We might still consider this separately, but for now, I would
say that this kind of corruption is exceedingly rare and doesn't
deserve special treatment (which will add a whole lot of complexity to
the code).

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-19 16:37:51 +01:00
Ben Kochie
48956f60d7
Update modernize (#17471)
Apply additional Go modernize tool improvements.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-04 05:13:49 +00:00
Laurent Dufresne
a6793c20e8 Added tests for histogram.Error
Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-10-30 08:47:03 +01:00
Laurent Dufresne
7621eb772c histogram: Add Error type for all histogram errors
`histogram.Error` becomes the generic wrapper type for all histogram errors.
This makes it easier and less error prone when adding new errors to check if
an error is an histogram error as well as making it less error prone to convert
the errors.

This change the type of those specific sentinel errors from error to
`histogram.Error`, but it should almost never matter.
e.g., `errors.Is(err, ErrHistogram...)` would still work out of the box.

Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-10-30 08:45:34 +01:00
George Krajcsovits
37418b5910
Merge pull request #17166 from Naman-B-Parlecha/NamanParlecha/NHCBtoCH
Unroll NHCBs to Classic Histograms func for RW
2025-10-30 08:44:26 +01:00
Naman-B-Parlecha
f14c515cbe fix(histogram): handling +Inf bucket count and metric label
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-28 20:29:44 +05:30
Linas Medžiūnas
44df626620
promql (histograms): reconcile mismatched NHCB bounds (#17278)
Fixes #17255.

The implementation happens mostly in the Add and Sub method, but the reconciliation works for all relevant operations. For example, you can now `rate` over a range wherein the custom bucket boundaries are changing.

Any custom bucket reconciliation is flagged with an info-level annotation.

---------

Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
2025-10-18 01:03:52 +02:00
beorn7
6a8cacdf6f model/histogram: Fix checkHistogramCustomBounds to accept -Inf
Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-10 23:10:32 +02:00
Naman-B-Parlecha
1df1f53ea0 fix: Added Unroll support to Sparse NHCBs
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-10 19:12:30 +05:30
NamanParlecha
167cb350f1
Merge branch 'prometheus:main' into NamanParlecha/NHCBtoCH 2025-10-10 18:59:53 +05:30
beorn7
51e0982c91 promql(histograms): Fix counter reset hint handling when aggregating
Fixes #17308.

As explained adding the warn-annotation about conflicting counter
reset hints doesn't happen consistently. Furthermore, because of
incremental mean calculation being used so far (which includes
subtraction), avg calculation always created gauge histograms.

The fix is to make Sub behave like Add WRT counter reset handling, and
then set the result of a subtraction to gauge explicitly in actual
PromQL subtraction (rather than using Sub for something else, like
incremental mean calculation). Also, track the presence of a
CounterReset hint and a NotCounterReset hint separately for the
entirety of aggregated histograms and create the warn-annotation based
on that.

As a minor fix, this commit also consistently creates the warn
annotation in aggregation to be about "aggregation" rather than
"subtraction" or "addition", because the latter are just internal
operations within the aggregation, which is not of interest for the
user.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-09 19:40:00 +02:00
Björn Rabenstein
f2fc492473
Merge pull request #17284 from linasm/custom-bucket-bounds-match-fn
NHCB: Separate CustomBucketBoundsMatch from FloatBucketsMatch
2025-10-07 15:38:59 +02:00
Naman-B-Parlecha
7871bcb465 fix(convert): error message
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-07 14:20:32 +05:30
Naman-B-Parlecha
79f3e76d89 fix(test): Comparing the labels correctly
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-07 00:22:25 +05:30
Naman-B-Parlecha
c072b0000a fix(convert): fix typos in comments
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-06 23:05:01 +05:30
Naman-B-Parlecha
083d0fa835 refactor(convert): updated tests and moved formatOpenMetricsFloat
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-06 22:56:45 +05:30
Linas Medziunas
c16db58061 NHCB: Reject custom bucket bounds with NaN value
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-10-06 16:37:28 +03:00
Linas Medziunas
8caf1f1c41 [NHCB] Separate CustomBucketBoundsMatch from floatBucketsMatch
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-10-05 22:38:07 +03:00
beorn7
3d7cf4c274 model/histogram: Validate non-negative count and zero bucket
We have always validated that none of the bucket is negative. We
should do the same for the count of observations and the zero bucket.

Note that this was always implied in the protobuf exposition format
because a count or a zero bucket population is ignored if it is not
positive.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-01 16:40:41 +02:00
Bryan Boreham
7056c70647
Merge pull request #16851 from jingchanglu/main
chore: fix some function names in comment
2025-09-30 12:54:48 +01:00
Naman-B-Parlecha
ed67a0cbf1 refactor(histogram): rename types for clarity in histogram conversion tests
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-25 17:40:10 +05:30
Naman-B-Parlecha
f71f911040 fix(lint): Changing tests
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-25 15:28:25 +05:30
Naman-B-Parlecha
73904b4c75 refactor(histogram): Converting to Absolute values and fixing the test
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-25 03:42:23 +05:30
György Krajcsovits
b6df8d3274
feat(chunkenc): allow more native histograms schemas
Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if
above.

The reduce code path will be slow, but we only expect it to happen if
TSDB already has higher resolution samples and we are in a rollback.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

# Conflicts:
#	model/histogram/generic.go
2025-09-23 11:20:48 +02:00
György Krajcsovits
794c545930
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-23 10:51:02 +02:00
Minh Nguyen
d04550a9c4
[RW2] Return 400 error code for wrongly-formatted histograms (#17210)
* return 400 error code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* add more cases

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* format code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit_fixing

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-09-23 07:24:46 +02:00
György Krajcsovits
5b39b79f5a
refactor error creation and tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-19 09:26:34 +02:00
George Krajcsovits
5e6900558a
Apply suggestions from code review
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-09-19 08:58:27 +02:00
György Krajcsovits
267be7dc20
fix(chunkenc): error out when reading unknown histogram schemas from chunks
Otherwise higher level code like PromQL needs to constantly check if it
can handle the samples.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-18 09:21:03 +02:00
Naman-B-Parlecha
5eeba3638d
adding comment for ConvertNHCBToClassicHistogram
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-17 15:48:57 +05:30
Naman-B-Parlecha
c8e3f8c97a
drop(flag): moving feature flag to other pr
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-17 15:32:16 +05:30
György Krajcsovits
bdf547ae9c
fix(nativehistograms): validation should fail on unsupported schemas
Histogram.Validate and FloatHistogram.Validate now return error on
unsupported schemas.

Scrape and remote-write handler reduces the schema to the maximum allowed
if it is above the maximum, but below theoretical maximum of 52.
For scrape the maximum is a configuration option, for remote-write it is 8.

Note: OTLP endpont already does the reduction, without checking that it is
below 52 as the spec does not specify a maximum.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-13 16:54:44 +02:00
beorn7
747c5ee2b1 Apply analyzer "modernize" to the whole codebase
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.

This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.

Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 14:48:41 +02:00
Julius Hinze
77b5c3f217
Histograms: set annotation when adding or subtracting histograms that have not_reset and reset hints.
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-20 15:00:45 +02:00
Julius Hinze
5855d973b0
model: set native histogram GaugeType hint when subtracting or multiplying/dividing with negative factors
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-12 18:16:39 +02:00
jingchanglu
9ddb21fccb chore: fix some function names in comment
Signed-off-by: jingchanglu <jingchanglu@outlook.com>
2025-07-10 14:43:25 +08:00
György Krajcsovits
6c646657d5
perf(chunkenc): intern the custom values for native histograms
The custom values are the "le" bucket boundaries of native histograms
with custom buckets. They are never modified. It is ok to not copy them
when iterating a chunk, just reference them.

If we will ever have a function that modifies the custom values, like
'trim' for example. That function will have to make a copy on write.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-05-07 14:40:45 +02:00
Arve Knudsen
e7e3ab2824
Fix linting issues found by golangci-lint v2.0.2 (#16368)
* Fix linting issues found by golangci-lint v2.0.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-05-03 19:05:13 +02:00
Joel Beckmeyer
bdace97744 fix TestCuttingNewHeadChunks/really_large_histograms on 32-bit
Signed-off-by: Joel Beckmeyer <joel@beckmeyer.us>
2024-12-16 10:45:01 -05:00
Matthieu MOREL
af1a19fc78 enable errorf rule from perfsprint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-11-06 16:50:36 +01:00
Neeraj Gartia
d4b1f9eb33
Corrects the behaviour of binary opperators between histogram and float (#14726)
promql: corrects binary operators functioning for mixed sample with histogram and float

For invalid pairings of sample types, an annotation is added now.

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>

---------

Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2024-10-15 14:44:36 +02:00
Björn Rabenstein
5b9148e552
Merge pull request #14820 from charleskorn/promqltest-native-histogram-format
promqltest: use test expression format for histograms in assertion failure messages and include reset hint in the test expression
2024-09-20 16:47:08 +02:00
Charles Korn
6dbb4e1a94
Fix linting issues
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-09-20 11:49:54 +10:00
Nathan Baulch
50cd453c8f
chore: Fix typos (#14868)
* Fix typos

---------

Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>
2024-09-10 22:32:03 +02:00
Charles Korn
e8c7482137
Return negative counts when multiplied or divided by a negative value
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-09-09 14:37:59 +10:00
Charles Korn
e67358d203
histogram: include counter reset hint in test expression output
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-09-04 15:46:52 +10:00
György Krajcsovits
505ffd34ef Fix lint error
Some weird formatting issue in using comment suggestion

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-06-24 09:33:03 +02:00
George Krajcsovits
f45709e710
Update model/histogram/histogram_test.go
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2024-06-24 07:51:56 +02:00
Jeanette Tan
5e4e93c316 fix lint
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-06-07 19:24:05 +08:00