750 Commits

Author SHA1 Message Date
Björn Rabenstein
b8d19543b8
Add histogram validation in remote-read and during reducing resolution (#17561)
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.

Furthermore, invalid negative offsets might cause problems, too.

Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)

In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-21 00:22:24 +01:00
Minh Nguyen
5087a25848
Remote Write Receive Fix: Remove duplicate labels when type-and-unit-label feature is on (#17546)
* drop extra label from receiver

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* used constant

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-18 09:37:09 +00:00
Bartlomiej Plotka
cefefc6897
prw2: Move Remote Write 2.0 CT to be per Sample; Rename to ST (start timestamp) (#17411)
Relates to
https://github.com/prometheus/prometheus/issues/16944#issuecomment-3164760343

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-17 14:59:40 +00:00
Laurent Dufresne
d99f8dacc4
chore: remove dead code (#17542)
Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-11-17 10:37:55 +01:00
Bartlomiej Plotka
f50ff0a40a
feat: rename CreatedTimestamp to StartTimestamp (#17523)
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.

```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st

```

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-13 14:17:51 +00:00
Bartlomiej Plotka
675bafe2fb
Merge pull request #17441 from pipiland2612/refactor_queue_manger
Refactor part of queue_manger.go by creating struct to reuse some common function
2025-11-13 15:07:11 +01:00
Minh Nguyen
7ebff91cfd
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled (#17472)
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled.

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-13 09:53:12 +01:00
Ben Kochie
204249fcb5
Update golangci-lint (#17478)
* Update golangci-lint to v2.6.0
* Fixup various linting issues.
* Fixup deprecations.
* Add exception for `labels.MetricName` deprecation.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-05 13:47:34 +01:00
Minh Nguyen
30992dd032
[RW2] Fix: Only update metadata to WAL when metadata-wal-records feature is enabled (#17470)
* add feature check when UpdateMetadata

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* add appendMetadata boolean to write_hander

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-04 08:16:57 +00:00
Ben Kochie
48956f60d7
Update modernize (#17471)
Apply additional Go modernize tool improvements.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-04 05:13:49 +00:00
Minh Nguyen
784ec0a792
update test to test both v1 and v2 (#17467)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-03 09:22:46 +00:00
pipiland2612
704afd8529 add timeSeriesAgeChecker to refactor filter code
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 23:19:53 +02:00
pipiland2612
9e6a626dae create timeSeriesStats to reduce return variable
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 22:17:45 +02:00
pipiland2612
e1cb29bf8a create common struct and function to DRY
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 21:55:14 +02:00
Minh Nguyen
c8f1de18a7
[RW2] Fix type and unit labels propagation in Remote Write v2 receiver to prioritize type-and-unit-labels feature (#17387)
* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix nits & update docs

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix docs

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 08:59:03 +00:00
György Krajcsovits
b8192127ee
Merge remote-tracking branch 'origin/release-3.7' into krajo/merge-3.7.3-to-main
# Conflicts:
#	CHANGELOG.md
#	storage/remote/queue_manager_test.go
2025-10-30 09:21:25 +01:00
Laurent Dufresne
7621eb772c histogram: Add Error type for all histogram errors
`histogram.Error` becomes the generic wrapper type for all histogram errors.
This makes it easier and less error prone when adding new errors to check if
an error is an histogram error as well as making it less error prone to convert
the errors.

This change the type of those specific sentinel errors from error to
`histogram.Error`, but it should almost never matter.
e.g., `errors.Is(err, ErrHistogram...)` would still work out of the box.

Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-10-30 08:45:34 +01:00
Ayoub Mrini
6806b68f93
[release-3.7] fix: Remote-write: revert changes in the queue resharding logic (#17412)
* Revert "chore: deprecate prometheus_remote_storage_{samples,exemplars,histograms}_in_total and prometheus_remote_storage_highest_timestamp_in_seconds"

This reverts commit ba14bc49db31a1b0ba3127e6ddf59a9f32a08dff.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* Revert "storage/remote: compute highestTimestamp and dataIn at QueueManager level"

This reverts commit 184c7eb9186aa8fea09920f2f8e8aa8a603da300.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* fix(remote-write): bring back the per queue metrics

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* test(remote): add TestRemoteWrite_ReshardingWithoutDeadlock to reproduce the sharding scale up deadlock

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

---------

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-10-29 14:04:09 +00:00
Arve Knudsen
c36e966bf8
OTLP: de-duplicate target_info samples with conflicting timestamps (#17400)
Add logic to the target_info metric generation in the OTLP endpoint, so that any samples with the same timestamp for the same (target_info) series are de-duplicated. It comes out of a user's bug report about duplicated target_info samples in Grafana Mimir (which uses the Prometheus target_info generation logic).

If I'm not mistaken, duplicate target_info samples should stem from multiple resources in the same OTLP request being translated to the same target_info label set. It shouldn't be caused by a Prometheus bug.
2025-10-28 14:13:43 +00:00
Minh Nguyen
6bb367970e
feat(promtool): add RW2 support to promtool push metrics using client_golang library (#17280)
* Add WriteProto method and tests for promtool metrics

This commit adds:
1. WriteProto method to storage/remote/client.go that handles
   marshaling and compression of protobuf messages
2. Updated parseAndPushMetrics in cmd/promtool/metrics.go to use
   the new WriteProto method
3. Comprehensive tests for PushMetrics functionality

The WriteProto method provides a cleaner API for sending protobuf
messages without manually handling marshaling and compression.

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* use Write method from exp/api/remote

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix lint

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix test

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit fixed

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix lint

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-27 13:56:48 +00:00
Minh Nguyen
f070e35358
[RW]: Adopt client_golang/exp/api/remote types for receiving RW1 and RW2 (#17197)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

# Conflicts:
#	storage/remote/write_handler.go

* add comment

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix failling test

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit_fixing

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix comment

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-24 10:31:34 +01:00
Julius Hinze
05612757b4
prometheusremotewrite: fix require.equal argument order (#17391)
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-10-23 15:13:32 +02:00
Arve Knudsen
ef42c088ba
OTLP: Add configuration parameters to control label name translation (#17345)
As a follow-up to #17344, add two configuration parameters for controlling label
name translation, both defaulting to on for backwards compatibility (currently
these behaviours are hardcoded as enabled):

* otlp.label_name_underscore_sanitization => Prefix label names starting with a
  single underscore with key_ when translating OTel attribute names
* otlp.label_name_preserve_multiple_underscores => Keep multiple consecutive
  underscores in label names when translating OTel attribute names

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-22 08:27:35 +02:00
György Krajcsovits
ea398c15e8
Merge branch 'release-3.7' into krajo/merge-release-3071-to-main 2025-10-17 10:45:55 +02:00
Arve Knudsen
99d0967133 Fix lint failure
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:56:18 +02:00
Arve Knudsen
f5804e7cf2 Remove configuration parameters
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:37:24 +02:00
Arve Knudsen
3de3a296dd Add reviewer feedback
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:13:08 +02:00
Arve Knudsen
dd3a607d2d Add configuration parameters
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:13:08 +02:00
Arve Knudsen
7cf4b5da55 OTLP: Upgrade prometheus/otlptranslator
The upgrade to prometheus/otlptranslator@7f02967de0 fixes two label
name translation bugs, when in legacy name translation mode:
* 'key' is no longer prefixed when label names start with an underscore
* Multiple consecutive underscores are combined into one

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:13:08 +02:00
harsh kumar
16a9a827de
remote-write: Add type and unit labels to 2.0 receiver when feature flag enabled (#17329)
* feat(remote): add support for type and unit labels in write handler

Signed-off-by: Harsh <harshmastic@gmail.com>

* minor fixes

Signed-off-by: Harsh <harshmastic@gmail.com>

* fix failing tests

Signed-off-by: Harsh <harshmastic@gmail.com>

* Update storage/remote/write_handler.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>

* Update storage/remote/write_handler.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>

* refactor: streamline label handling for type and unit in write handler tests

Signed-off-by: Harsh <harshmastic@gmail.com>

* test: enhance V2 message tests for type and unit labels

Signed-off-by: Harsh <harshmastic@gmail.com>

---------

Signed-off-by: Harsh <harshmastic@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-10-15 18:19:41 +01:00
beorn7
ad7d1aed99 Phase out native histogram feature flag
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .

This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.

To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.

Further implications:

- The default scrape protocols now depend on the
  `scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".

Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-15 14:50:52 +02:00
Fiona Liao
9a5bccbd4b
refactor: make OTEL temporality check easier to read (#16692)
* Make OTEL temporality check easier to read
* Add nolint comment

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
2025-10-14 13:29:23 +02:00
George Krajcsovits
fe11cae637
Merge pull request #17287 from linasm/reject-nan-histogram-custom-bounds
NHCB: Reject custom bucket bounds with NaN value
2025-10-06 18:11:03 +02:00
Linas Medziunas
c16db58061 NHCB: Reject custom bucket bounds with NaN value
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-10-06 16:37:28 +03:00
Minh Nguyen
106e6f2c77
[RW2] Return 400 for Exemplars without Series or Histograms not written (#17250)
* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix cmt

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-06 12:53:44 +01:00
beorn7
3d7cf4c274 model/histogram: Validate non-negative count and zero bucket
We have always validated that none of the bucket is negative. We
should do the same for the count of observations and the zero bucket.

Note that this was always implied in the protobuf exposition format
because a count or a zero bucket population is ignored if it is not
positive.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-01 16:40:41 +02:00
Bryan Boreham
0d3ec05056
Merge pull request #17043 from machine424/ffl
chore: allow seamless use of testing/synctest for >=go1.24
2025-09-30 12:11:12 +01:00
György Krajcsovits
a5a6413c1a
better errors naming and formatting, typo fixes
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-23 11:20:55 +02:00
György Krajcsovits
6e42da8904
feat(remote): reduce resolution of native histograms on remote read
If a sample read through remote read has too high resolution,
reduce it to the maximum allowed.

This is a slow path, but we only expect it to happen if the server
side is newer version that allows higher resolution.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-23 11:20:55 +02:00
György Krajcsovits
794c545930
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-23 10:51:02 +02:00
Minh Nguyen
d04550a9c4
[RW2] Return 400 error code for wrongly-formatted histograms (#17210)
* return 400 error code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* add more cases

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* format code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit_fixing

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-09-23 07:24:46 +02:00
machine424
365409d3be
chore: allow seamless use of testing/synctest for >=go1.24
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-19 22:48:25 +02:00
György Krajcsovits
5b39b79f5a
refactor error creation and tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-19 09:26:34 +02:00
György Krajcsovits
b99378f2c4
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-19 08:59:00 +02:00
George Krajcsovits
5e6900558a
Apply suggestions from code review
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-09-19 08:58:27 +02:00
György Krajcsovits
f0a297bb7c
fix(remote): validate native histogram schema in remote read
When remote read returns chunks, the validation is in tsdb/chunkenc.
However when it returns samples, we need to modify the iterator to
validate.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-18 11:09:45 +02:00
machine424
8462515c75
test(storage/remote/queue_manager_test.go): use synctest in TestShutdown for better
control over time

The test becomes flaky after it was asked to run on parallel
and "fight" for resources

let's hide all of that

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-17 11:20:07 +02:00
György Krajcsovits
0cf54d7819
perf(otlp): reduce logs from OTLP endpoint
It's not possible to store created timestamp at the same timestamp as
the current sample, so do not even try.

In OpenTelemetry spec, if the start time is unknown, it will be set to
the same timestamp as the first sample.
https://opentelemetry.io/docs/specs/otel/metrics/data-model/#cumulative-streams-handling-unknown-start-time
This means that we will get a lot of duplicate sample for timestamp
errors and we should not log those.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-17 08:50:43 +02:00
György Krajcsovits
bdf547ae9c
fix(nativehistograms): validation should fail on unsupported schemas
Histogram.Validate and FloatHistogram.Validate now return error on
unsupported schemas.

Scrape and remote-write handler reduces the schema to the maximum allowed
if it is above the maximum, but below theoretical maximum of 52.
For scrape the maximum is a configuration option, for remote-write it is 8.

Note: OTLP endpont already does the reduction, without checking that it is
below 52 as the spec does not specify a maximum.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-13 16:54:44 +02:00
Minh Nguyen
0fc2547740
Handle error gracefully for the desymbolizeLabels function in prompb/io/prometheus/write/v2/symbols.go (#17160)
Signed-off-by: pipiland <user123@Minhs-Macbook.local>

---------

Signed-off-by: pipiland <user123@Minhs-Macbook.local>
Co-authored-by: pipiland <user123@Minhs-Macbook.local>
2025-09-08 13:04:55 -07:00