1685 Commits

Author SHA1 Message Date
bwplotka
652ea5541b feat(storage): switch to AppenderV2 (to split)
Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-12-01 10:58:28 +00:00
bwplotka
cb83cf5d92 refactor(appenderV2): add TSDB AppenderV2 implementation
Signed-off-by: bwplotka <bwplotka@gmail.com>

tmp

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-12-01 10:52:53 +00:00
bwplotka
6724ba2a1d refactor(appenderV2): 1:1 copy of head_append.go -> head_append_v2.go (starting point)
Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-12-01 10:52:51 +00:00
bwplotka
8e570fe0f3 refactor(appenderV2): add AppenderV2 interface
Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-12-01 09:40:08 +00:00
George Krajcsovits
a66c696530
chore(storage): update docstring (#17609)
The original implementation in #9705 for native histograms included a
technical dept #15177 where samples were committed ordered by type
not by their append order. This was fixed in #17071, but this docstring
was not updated.

I've also took the liberty to mention that we do not order by timestamp
either, thus it is possible to append out of order samples.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-11-25 15:33:35 +01:00
Björn Rabenstein
b8d19543b8
Add histogram validation in remote-read and during reducing resolution (#17561)
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.

Furthermore, invalid negative offsets might cause problems, too.

Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)

In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-11-21 00:22:24 +01:00
Minh Nguyen
5087a25848
Remote Write Receive Fix: Remove duplicate labels when type-and-unit-label feature is on (#17546)
* drop extra label from receiver

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* used constant

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-18 09:37:09 +00:00
Bartlomiej Plotka
cefefc6897
prw2: Move Remote Write 2.0 CT to be per Sample; Rename to ST (start timestamp) (#17411)
Relates to
https://github.com/prometheus/prometheus/issues/16944#issuecomment-3164760343

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-17 14:59:40 +00:00
Laurent Dufresne
d99f8dacc4
chore: remove dead code (#17542)
Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-11-17 10:37:55 +01:00
Bartlomiej Plotka
f50ff0a40a
feat: rename CreatedTimestamp to StartTimestamp (#17523)
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.

```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st

```

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-13 14:17:51 +00:00
Bartlomiej Plotka
675bafe2fb
Merge pull request #17441 from pipiland2612/refactor_queue_manger
Refactor part of queue_manger.go by creating struct to reuse some common function
2025-11-13 15:07:11 +01:00
Minh Nguyen
7ebff91cfd
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled (#17472)
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled.

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-13 09:53:12 +01:00
Ben Kochie
204249fcb5
Update golangci-lint (#17478)
* Update golangci-lint to v2.6.0
* Fixup various linting issues.
* Fixup deprecations.
* Add exception for `labels.MetricName` deprecation.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-05 13:47:34 +01:00
Minh Nguyen
30992dd032
[RW2] Fix: Only update metadata to WAL when metadata-wal-records feature is enabled (#17470)
* add feature check when UpdateMetadata

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* add appendMetadata boolean to write_hander

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-04 08:16:57 +00:00
Ben Kochie
48956f60d7
Update modernize (#17471)
Apply additional Go modernize tool improvements.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-04 05:13:49 +00:00
Minh Nguyen
784ec0a792
update test to test both v1 and v2 (#17467)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-11-03 09:22:46 +00:00
pipiland2612
704afd8529 add timeSeriesAgeChecker to refactor filter code
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 23:19:53 +02:00
pipiland2612
9e6a626dae create timeSeriesStats to reduce return variable
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 22:17:45 +02:00
pipiland2612
e1cb29bf8a create common struct and function to DRY
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 21:55:14 +02:00
Minh Nguyen
c8f1de18a7
[RW2] Fix type and unit labels propagation in Remote Write v2 receiver to prioritize type-and-unit-labels feature (#17387)
* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix nits & update docs

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix docs

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-31 08:59:03 +00:00
Björn Rabenstein
84d2007a08
Merge pull request #17423 from geogrego/main
docs: Fix typos
2025-10-30 16:56:48 +01:00
György Krajcsovits
b8192127ee
Merge remote-tracking branch 'origin/release-3.7' into krajo/merge-3.7.3-to-main
# Conflicts:
#	CHANGELOG.md
#	storage/remote/queue_manager_test.go
2025-10-30 09:21:25 +01:00
Laurent Dufresne
7621eb772c histogram: Add Error type for all histogram errors
`histogram.Error` becomes the generic wrapper type for all histogram errors.
This makes it easier and less error prone when adding new errors to check if
an error is an histogram error as well as making it less error prone to convert
the errors.

This change the type of those specific sentinel errors from error to
`histogram.Error`, but it should almost never matter.
e.g., `errors.Is(err, ErrHistogram...)` would still work out of the box.

Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
2025-10-30 08:45:34 +01:00
Ayoub Mrini
6806b68f93
[release-3.7] fix: Remote-write: revert changes in the queue resharding logic (#17412)
* Revert "chore: deprecate prometheus_remote_storage_{samples,exemplars,histograms}_in_total and prometheus_remote_storage_highest_timestamp_in_seconds"

This reverts commit ba14bc49db31a1b0ba3127e6ddf59a9f32a08dff.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* Revert "storage/remote: compute highestTimestamp and dataIn at QueueManager level"

This reverts commit 184c7eb9186aa8fea09920f2f8e8aa8a603da300.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* fix(remote-write): bring back the per queue metrics

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* test(remote): add TestRemoteWrite_ReshardingWithoutDeadlock to reproduce the sharding scale up deadlock

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

---------

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-10-29 14:04:09 +00:00
geogrego
58dbe927d5 docs: minor improvement for docs
Signed-off-by: geogrego <geogrego@outlook.com>
2025-10-29 14:42:14 +08:00
Arve Knudsen
c36e966bf8
OTLP: de-duplicate target_info samples with conflicting timestamps (#17400)
Add logic to the target_info metric generation in the OTLP endpoint, so that any samples with the same timestamp for the same (target_info) series are de-duplicated. It comes out of a user's bug report about duplicated target_info samples in Grafana Mimir (which uses the Prometheus target_info generation logic).

If I'm not mistaken, duplicate target_info samples should stem from multiple resources in the same OTLP request being translated to the same target_info label set. It shouldn't be caused by a Prometheus bug.
2025-10-28 14:13:43 +00:00
Minh Nguyen
6bb367970e
feat(promtool): add RW2 support to promtool push metrics using client_golang library (#17280)
* Add WriteProto method and tests for promtool metrics

This commit adds:
1. WriteProto method to storage/remote/client.go that handles
   marshaling and compression of protobuf messages
2. Updated parseAndPushMetrics in cmd/promtool/metrics.go to use
   the new WriteProto method
3. Comprehensive tests for PushMetrics functionality

The WriteProto method provides a cleaner API for sending protobuf
messages without manually handling marshaling and compression.

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* use Write method from exp/api/remote

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix lint

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix test

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit fixed

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix lint

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-27 13:56:48 +00:00
Minh Nguyen
f070e35358
[RW]: Adopt client_golang/exp/api/remote types for receiving RW1 and RW2 (#17197)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

# Conflicts:
#	storage/remote/write_handler.go

* add comment

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix failling test

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit_fixing

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix comment

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-24 10:31:34 +01:00
Julius Hinze
05612757b4
prometheusremotewrite: fix require.equal argument order (#17391)
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-10-23 15:13:32 +02:00
Arve Knudsen
ef42c088ba
OTLP: Add configuration parameters to control label name translation (#17345)
As a follow-up to #17344, add two configuration parameters for controlling label
name translation, both defaulting to on for backwards compatibility (currently
these behaviours are hardcoded as enabled):

* otlp.label_name_underscore_sanitization => Prefix label names starting with a
  single underscore with key_ when translating OTel attribute names
* otlp.label_name_preserve_multiple_underscores => Keep multiple consecutive
  underscores in label names when translating OTel attribute names

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-22 08:27:35 +02:00
György Krajcsovits
ea398c15e8
Merge branch 'release-3.7' into krajo/merge-release-3071-to-main 2025-10-17 10:45:55 +02:00
Arve Knudsen
99d0967133 Fix lint failure
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:56:18 +02:00
Arve Knudsen
f5804e7cf2 Remove configuration parameters
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:37:24 +02:00
Arve Knudsen
3de3a296dd Add reviewer feedback
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:13:08 +02:00
Arve Knudsen
dd3a607d2d Add configuration parameters
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:13:08 +02:00
Arve Knudsen
7cf4b5da55 OTLP: Upgrade prometheus/otlptranslator
The upgrade to prometheus/otlptranslator@7f02967de0 fixes two label
name translation bugs, when in legacy name translation mode:
* 'key' is no longer prefixed when label names start with an underscore
* Multiple consecutive underscores are combined into one

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-16 16:13:08 +02:00
harsh kumar
16a9a827de
remote-write: Add type and unit labels to 2.0 receiver when feature flag enabled (#17329)
* feat(remote): add support for type and unit labels in write handler

Signed-off-by: Harsh <harshmastic@gmail.com>

* minor fixes

Signed-off-by: Harsh <harshmastic@gmail.com>

* fix failing tests

Signed-off-by: Harsh <harshmastic@gmail.com>

* Update storage/remote/write_handler.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>

* Update storage/remote/write_handler.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>

* refactor: streamline label handling for type and unit in write handler tests

Signed-off-by: Harsh <harshmastic@gmail.com>

* test: enhance V2 message tests for type and unit labels

Signed-off-by: Harsh <harshmastic@gmail.com>

---------

Signed-off-by: Harsh <harshmastic@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-10-15 18:19:41 +01:00
beorn7
ad7d1aed99 Phase out native histogram feature flag
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .

This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.

To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.

Further implications:

- The default scrape protocols now depend on the
  `scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".

Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-15 14:50:52 +02:00
Fiona Liao
9a5bccbd4b
refactor: make OTEL temporality check easier to read (#16692)
* Make OTEL temporality check easier to read
* Add nolint comment

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
2025-10-14 13:29:23 +02:00
George Krajcsovits
fe11cae637
Merge pull request #17287 from linasm/reject-nan-histogram-custom-bounds
NHCB: Reject custom bucket bounds with NaN value
2025-10-06 18:11:03 +02:00
Linas Medziunas
c16db58061 NHCB: Reject custom bucket bounds with NaN value
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-10-06 16:37:28 +03:00
Minh Nguyen
106e6f2c77
[RW2] Return 400 for Exemplars without Series or Histograms not written (#17250)
* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix cmt

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-10-06 12:53:44 +01:00
beorn7
3d7cf4c274 model/histogram: Validate non-negative count and zero bucket
We have always validated that none of the bucket is negative. We
should do the same for the count of observations and the zero bucket.

Note that this was always implied in the protobuf exposition format
because a count or a zero bucket population is ignored if it is not
positive.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-01 16:40:41 +02:00
Bryan Boreham
0d3ec05056
Merge pull request #17043 from machine424/ffl
chore: allow seamless use of testing/synctest for >=go1.24
2025-09-30 12:11:12 +01:00
György Krajcsovits
a5a6413c1a
better errors naming and formatting, typo fixes
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-23 11:20:55 +02:00
György Krajcsovits
6e42da8904
feat(remote): reduce resolution of native histograms on remote read
If a sample read through remote read has too high resolution,
reduce it to the maximum allowed.

This is a slow path, but we only expect it to happen if the server
side is newer version that allows higher resolution.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-23 11:20:55 +02:00
György Krajcsovits
794c545930
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-23 10:51:02 +02:00
Minh Nguyen
d04550a9c4
[RW2] Return 400 error code for wrongly-formatted histograms (#17210)
* return 400 error code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* add more cases

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* format code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit_fixing

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-09-23 07:24:46 +02:00
machine424
365409d3be
chore: allow seamless use of testing/synctest for >=go1.24
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-19 22:48:25 +02:00
György Krajcsovits
5b39b79f5a
refactor error creation and tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-19 09:26:34 +02:00