593 Commits

Author SHA1 Message Date
Björn Rabenstein
b8d19543b8
Add histogram validation in remote-read and during reducing resolution (#17561)
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.

Furthermore, invalid negative offsets might cause problems, too.

Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)

In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-21 00:22:24 +01:00
beorn7
2dfc324821 model/histogram: Make histogram bucket iterators more robust
Currently, iterating over histogram buckets can panic if the spans are
not consistent with the buckets. We aim for validating histograms upon
ingestion, but there might still be data corruptions on disk that
could trigger the panic. While data corruption on disk is really bad
and will lead to all kind of weirdness, we should still avoid
panic'ing.

Note, though, that chunks are secured by checksums, so the corruptions
won't realistically happen because of disk faults, but more likely
because a chunk was generated in a faulty way in the first place, by
a software bug or even maliciously.

This commit prevents panics in the situation where there are fewer
buckets than described by the spans. Note that the missing buckets
will simply not be iterated over. There is no signalling of this
problem. We might still consider this separately, but for now, I would
say that this kind of corruption is exceedingly rare and doesn't
deserve special treatment (which will add a whole lot of complexity to
the code).

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-19 16:37:51 +01:00
Grégoire
1174b0ce4f
model/textparse: Remove unit validation in protobuf parsing (#16834)
Signed-off-by: Gregoire Verdier <gregoire.verdier@gmail.com>
2025-11-19 14:03:32 +01:00
Bartlomiej Plotka
f50ff0a40a
feat: rename CreatedTimestamp to StartTimestamp (#17523)
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.

```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st

```

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-13 14:17:51 +00:00
Bryan Boreham
a57aea2915
Improve assertion failure message (#17252)
Signed-off-by: Charles Korn <charles.korn@grafana.com>
Co-authored-by: Charles Korn <charles.korn@grafana.com>
2025-11-12 11:53:32 +01:00
Ben Kochie
204249fcb5
Update golangci-lint (#17478)
* Update golangci-lint to v2.6.0
* Fixup various linting issues.
* Fixup deprecations.
* Add exception for `labels.MetricName` deprecation.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-05 13:47:34 +01:00
Ben Kochie
48956f60d7
Update modernize (#17471)
Apply additional Go modernize tool improvements.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-04 05:13:49 +00:00
Julius Volz
0093e2159e
Merge pull request #17337 from prometheus/ui/visualize-relabel-steps
ui: Allow viewing detailed relabeling steps for each discovered target
2025-11-02 13:51:55 +01:00
Laurent Dufresne
a6793c20e8 Added tests for histogram.Error
Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-10-30 08:47:03 +01:00
Laurent Dufresne
7621eb772c histogram: Add Error type for all histogram errors
`histogram.Error` becomes the generic wrapper type for all histogram errors.
This makes it easier and less error prone when adding new errors to check if
an error is an histogram error as well as making it less error prone to convert
the errors.

This change the type of those specific sentinel errors from error to
`histogram.Error`, but it should almost never matter.
e.g., `errors.Is(err, ErrHistogram...)` would still work out of the box.

Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-10-30 08:45:34 +01:00
George Krajcsovits
37418b5910
Merge pull request #17166 from Naman-B-Parlecha/NamanParlecha/NHCBtoCH
Unroll NHCBs to Classic Histograms func for RW
2025-10-30 08:44:26 +01:00
Naman-B-Parlecha
f14c515cbe fix(histogram): handling +Inf bucket count and metric label
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-28 20:29:44 +05:30
György Krajcsovits
fbd5353a19
Merge remote-tracking branch 'origin/release-3.7' into krajo/merge-release-372-to-main 2025-10-22 18:02:22 +02:00
Julien Pivotto
c9d4689e0b relabeling: Fix labelmap action validation with legacy metric name scheme
Fixes #17370

In Prometheus v3.7.0, using labelmap actions with replacement patterns
containing regex variables (e.g., `$1`, `${1}`) would fail validation
when `metric_name_validation_scheme` was set to `legacy`, causing
Prometheus to fail at startup with:
  "$1" is invalid 'replacement' for labelmap action

This was a regression as the same configuration worked in v3.6.0.

The issue was in the validation logic: while UTF-8 validation correctly
allowed `$` characters, legacy validation incorrectly used
`IsValidLabelName` which rejects `$` characters. The fix ensures legacy
validation uses `relabelTargetLegacy` regex which explicitly supports
regex template variables.

Added test cases to verify labelmap validation works with both `$1` and
`${1}` replacement patterns under legacy validation scheme.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
2025-10-22 10:13:06 +02:00
Linas Medžiūnas
44df626620
promql (histograms): reconcile mismatched NHCB bounds (#17278)
Fixes #17255.

The implementation happens mostly in the Add and Sub method, but the reconciliation works for all relevant operations. For example, you can now `rate` over a range wherein the custom bucket boundaries are changing.

Any custom bucket reconciliation is flagged with an info-level annotation.

---------

Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
2025-10-18 01:03:52 +02:00
Julius Volz
8b1bd7d6c3 ui: Allow viewing detailed relabeling steps for each discovered target
This adds:

* A `ScrapePoolConfig()` method to the scrape manager that allows getting
  the scrape config for a given pool.
* An API endpoint at `/api/v1/targets/relabel_steps` that takes a pool name
  and a label set of a target and returns a detailed list of applied
  relabeling rules and their output for each step.
* A "show relabeling" link/button for each target on the discovery page
  that shows the detailed flow of all relabeling rules (based on the API
  response) for that target.

Note that this changes the JSON encoding of the relabeling rule config
struct to output the original snake_case (instead of camelCase) field names,
and before merging, we need to be sure that's ok :) See my comment about
that at https://github.com/prometheus/prometheus/pull/15383#issuecomment-3405591487

Fixes https://github.com/prometheus/prometheus/issues/17283

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2025-10-15 15:33:27 +02:00
beorn7
ad7d1aed99 Phase out native histogram feature flag
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .

This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.

To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.

Further implications:

- The default scrape protocols now depend on the
  `scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".

Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-15 14:50:52 +02:00
beorn7
6a8cacdf6f model/histogram: Fix checkHistogramCustomBounds to accept -Inf
Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-10 23:10:32 +02:00
Naman-B-Parlecha
1df1f53ea0 fix: Added Unroll support to Sparse NHCBs
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-10 19:12:30 +05:30
NamanParlecha
167cb350f1
Merge branch 'prometheus:main' into NamanParlecha/NHCBtoCH 2025-10-10 18:59:53 +05:30
beorn7
51e0982c91 promql(histograms): Fix counter reset hint handling when aggregating
Fixes #17308.

As explained adding the warn-annotation about conflicting counter
reset hints doesn't happen consistently. Furthermore, because of
incremental mean calculation being used so far (which includes
subtraction), avg calculation always created gauge histograms.

The fix is to make Sub behave like Add WRT counter reset handling, and
then set the result of a subtraction to gauge explicitly in actual
PromQL subtraction (rather than using Sub for something else, like
incremental mean calculation). Also, track the presence of a
CounterReset hint and a NotCounterReset hint separately for the
entirety of aggregated histograms and create the warn-annotation based
on that.

As a minor fix, this commit also consistently creates the warn
annotation in aggregation to be about "aggregation" rather than
"subtraction" or "addition", because the latter are just internal
operations within the aggregation, which is not of interest for the
user.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-09 19:40:00 +02:00
Björn Rabenstein
f2fc492473
Merge pull request #17284 from linasm/custom-bucket-bounds-match-fn
NHCB: Separate CustomBucketBoundsMatch from FloatBucketsMatch
2025-10-07 15:38:59 +02:00
Bartlomiej Plotka
a4da440dad
fix: Fix slicelabels corruption when used with proto decoding (#17150)
* fix: Fix slicelabels corruption when used with proto decoding

Alternative to https://github.com/prometheus/prometheus/pull/16957/

Signed-off-by: bwplotka <bwplotka@gmail.com>

* addressed comments

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-10-07 12:06:48 +01:00
Naman-B-Parlecha
7871bcb465 fix(convert): error message
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-07 14:20:32 +05:30
Naman-B-Parlecha
79f3e76d89 fix(test): Comparing the labels correctly
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-07 00:22:25 +05:30
Naman-B-Parlecha
c072b0000a fix(convert): fix typos in comments
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-06 23:05:01 +05:30
Naman-B-Parlecha
083d0fa835 refactor(convert): updated tests and moved formatOpenMetricsFloat
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-10-06 22:56:45 +05:30
Linas Medziunas
c16db58061 NHCB: Reject custom bucket bounds with NaN value
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-10-06 16:37:28 +03:00
Linas Medziunas
8caf1f1c41 [NHCB] Separate CustomBucketBoundsMatch from floatBucketsMatch
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-10-05 22:38:07 +03:00
Bryan Boreham
968d722bb2
Merge pull request #17212 from bboreham/no-simplify
[PERF] Regex: stop calling Simplify
2025-10-02 10:51:04 +01:00
beorn7
3d7cf4c274 model/histogram: Validate non-negative count and zero bucket
We have always validated that none of the bucket is negative. We
should do the same for the count of observations and the zero bucket.

Note that this was always implied in the protobuf exposition format
because a count or a zero bucket population is ignored if it is not
positive.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-01 16:40:41 +02:00
Charles Korn
a2adccadd2 Improve assertion failure message
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2025-10-01 09:30:24 +01:00
Bryan Boreham
7056c70647
Merge pull request #16851 from jingchanglu/main
chore: fix some function names in comment
2025-09-30 12:54:48 +01:00
Naman-B-Parlecha
ed67a0cbf1 refactor(histogram): rename types for clarity in histogram conversion tests
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-25 17:40:10 +05:30
Naman-B-Parlecha
f71f911040 fix(lint): Changing tests
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-25 15:28:25 +05:30
Naman-B-Parlecha
73904b4c75 refactor(histogram): Converting to Absolute values and fixing the test
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-25 03:42:23 +05:30
György Krajcsovits
b6df8d3274
feat(chunkenc): allow more native histograms schemas
Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if
above.

The reduce code path will be slow, but we only expect it to happen if
TSDB already has higher resolution samples and we are in a rollback.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

# Conflicts:
#	model/histogram/generic.go
2025-09-23 11:20:48 +02:00
György Krajcsovits
794c545930
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-23 10:51:02 +02:00
Minh Nguyen
d04550a9c4
[RW2] Return 400 error code for wrongly-formatted histograms (#17210)
* return 400 error code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* add more cases

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* format code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit_fixing

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-09-23 07:24:46 +02:00
György Krajcsovits
5b39b79f5a
refactor error creation and tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-19 09:26:34 +02:00
George Krajcsovits
5e6900558a
Apply suggestions from code review
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-09-19 08:58:27 +02:00
Bryan Boreham
c743b2f3cd [PERF] Regex: stop calling Simplify
It slows down compilation and doesn't make any of our benchmarks go faster.
Assumed to be something that helped at an earlier point, but doesn't help now.

Add a benchmark with a more complicated regex to demonstrate the slowdown.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-09-18 11:20:14 +01:00
György Krajcsovits
267be7dc20
fix(chunkenc): error out when reading unknown histogram schemas from chunks
Otherwise higher level code like PromQL needs to constantly check if it
can handle the samples.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-18 09:21:03 +02:00
Naman-B-Parlecha
5eeba3638d
adding comment for ConvertNHCBToClassicHistogram
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-17 15:48:57 +05:30
Naman-B-Parlecha
c8e3f8c97a
drop(flag): moving feature flag to other pr
Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-17 15:32:16 +05:30
György Krajcsovits
bdf547ae9c
fix(nativehistograms): validation should fail on unsupported schemas
Histogram.Validate and FloatHistogram.Validate now return error on
unsupported schemas.

Scrape and remote-write handler reduces the schema to the maximum allowed
if it is above the maximum, but below theoretical maximum of 52.
For scrape the maximum is a configuration option, for remote-write it is 8.

Note: OTLP endpont already does the reduction, without checking that it is
below 52 as the spec does not specify a maximum.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-13 16:54:44 +02:00
NamanParlecha
594f9d63a5
refactor(textparse): Introduce Variadic options in textParse.New (#17155)
* refactor(textparse): introduce ParserOptions struct for cleaner parser initialization

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(fuzz): update fuzzParseMetricWithContentType to use ParserOptions

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): simplify ParserOptions usage in tests and implementations

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parse): using variadic options

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): add fallbackType & SymbolTable to variadic options

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): private fields

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(scrape): compose parser options

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): add comments

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): update to use ParserOptions struct for configuration

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(scrape): remove unused parserOptions field from scrapeLoop

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): update ParserOptions field names and add comments for clarity

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-11 10:49:42 +01:00
George Krajcsovits
acd9aa0afb
fix(textparse/protobuf): metric family name corrupted by NHCB parser (#17156)
* fix(textparse): implement NHCB parsing in ProtoBuf parser directly

The NHCB conversion does some validation, but we can only return error
from Parser.Next() not Parser.Histogram(). So the conversion needs to
happen in Next().

There are 2 cases:
1. "always_scrape_classic_histograms" is enabled, in which case we
convert after returning the classic series. This is to be consistent
with the PromParser text parser, which collects NHCB while spitting out
classic series; then returns the NHCB.
2. "always_scrape_classic_histograms" is disabled. In which case we never
return the classic series.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* refactor(textparse): skip classic series instead of adding NHCB around

Do not return the first classic series from the EntryType state,
switch to EntrySeries. This means we need to start the histogram
field state from -3 , not -2.

In EntrySeries, skip classic series if needed.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* reuse nhcb converter

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

* test(textparse/nhcb): test corrupting metric family name

NHCB parse doesn't always copy the metric name from the underlying
parser. When called via HELP, UNIT, the string is directly referenced
which means that the read-ahead of NHCB can corrupt it.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-08 17:26:41 +02:00
Arve Knudsen
913cc8f72b
Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2 (#17151)
* Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2
* Upgrade to client_golang@v1.23.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-09-06 13:04:24 +02:00
George Krajcsovits
31e4d84edd
refactor(textparse): allow for parsers with direct NHCB support (#17153)
Hide adding NHCB parser on top another parser in New() function
so we can easily add direct NHCB capable parsers.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-06 11:45:44 +02:00