16200 Commits

Author SHA1 Message Date
George Krajcsovits
55a4782eb7
Merge pull request #17214 from prometheus/krajo/native-histogram-schema-wal
Native histograms: ignore invalid schemas from WAL and log
2025-09-24 14:59:18 +02:00
George Krajcsovits
35d9f28c87
Update tsdb/record/record.go
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-09-24 14:27:37 +02:00
György Krajcsovits
30f941c57c
fix(wal): ignore invalid native histogram schemas on load
Reduce the resolution of histograms as needed and ignore invalid
schemas while emitting a warning log.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-24 11:41:25 +02:00
George Krajcsovits
f53782b009
Merge pull request #17213 from prometheus/krajo/native-histogram-schema-reduce
Native histograms: reduce resolution as needed when reading from chunk or remote read
2025-09-24 11:28:35 +02:00
George Krajcsovits
112f91803c
Merge pull request #17189 from prometheus/krajo/native-histogram-schema-validation
fix(nativehistograms): validation should fail on unsupported schemas
2025-09-24 11:27:26 +02:00
Michael Shen
1eaddc64d0
Migrate K8s discovery service queues to use strongly typed queues
Signed-off-by: Michael Shen <mishen@umich.edu>
2025-09-23 20:32:11 -07:00
Michael Shen
9c525b84c4
Add deprecation notice to associated K8s endpoints API objects
Signed-off-by: Michael Shen <mishen@umich.edu>
2025-09-23 20:30:37 -07:00
Michael Shen
1703e54dfd
Update to k8s.io v0.33.5
Signed-off-by: Michael Shen <mishen@umich.edu>
2025-09-23 20:30:36 -07:00
Simon Pasquier
dde7d6ad37
doc: clarify start/end for label API endpoints (#17217)
Because the label API endpoints read from the TSDB indexes, they can
return information for series which are present in the index but have no
samples in the queried interval.

Add similar note for the series endpoint.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2025-09-23 12:03:14 +01:00
György Krajcsovits
a5a6413c1a
better errors naming and formatting, typo fixes
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-23 11:20:55 +02:00
György Krajcsovits
6e42da8904
feat(remote): reduce resolution of native histograms on remote read
If a sample read through remote read has too high resolution,
reduce it to the maximum allowed.

This is a slow path, but we only expect it to happen if the server
side is newer version that allows higher resolution.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-23 11:20:55 +02:00
György Krajcsovits
b6df8d3274
feat(chunkenc): allow more native histograms schemas
Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if
above.

The reduce code path will be slow, but we only expect it to happen if
TSDB already has higher resolution samples and we are in a rollback.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

# Conflicts:
#	model/histogram/generic.go
2025-09-23 11:20:48 +02:00
György Krajcsovits
794c545930
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-23 10:51:02 +02:00
Minh Nguyen
d04550a9c4
[RW2] Return 400 error code for wrongly-formatted histograms (#17210)
* return 400 error code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* fix

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* add more cases

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* format code

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

* nit_fixing

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>

---------

Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-09-23 07:24:46 +02:00
machine424
365409d3be
chore: allow seamless use of testing/synctest for >=go1.24
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-19 22:48:25 +02:00
György Krajcsovits
5b39b79f5a
refactor error creation and tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-19 09:26:34 +02:00
György Krajcsovits
b99378f2c4
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-19 08:59:00 +02:00
George Krajcsovits
5e6900558a
Apply suggestions from code review
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-09-19 08:58:27 +02:00
beorn7
aac5cc3d99 web: Trim excessive line length in federate.go
Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-19 00:35:12 +02:00
Björn Rabenstein
d5cc5e2738
Merge pull request #17071 from prometheus/beorn7/tsdb
tsdb: Fix commit order for mixed-typed series
2025-09-18 13:55:31 +02:00
George Krajcsovits
95b0d75fbc
Merge pull request #17201 from prometheus/krajo/ignore-duplicate-ct
perf(otlp): reduce logs from OTLP endpoint
2025-09-18 13:37:51 +02:00
György Krajcsovits
f0a297bb7c
fix(remote): validate native histogram schema in remote read
When remote read returns chunks, the validation is in tsdb/chunkenc.
However when it returns samples, we need to modify the iterator to
validate.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-18 11:09:45 +02:00
György Krajcsovits
267be7dc20
fix(chunkenc): error out when reading unknown histogram schemas from chunks
Otherwise higher level code like PromQL needs to constantly check if it
can handle the samples.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-18 09:21:03 +02:00
Ayoub Mrini
4917346065
Merge pull request #17203 from machine424/release36
chore: prepare release 3.6.0
v0.306.0 v3.6.0
2025-09-17 21:05:30 +02:00
beorn7
bd0bf66f31 tsdb: Include floatHistograms in headAppender.Rollback()
Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7
b1fbf4f1e2 tsdb: Refactor staleness marker handling
With the fixed commit order, we can now handle the conversion of float
staleness markers to histogram staleness markers in a more direct way.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7
385d2800c9 promqltest: Add regression test for mixed-sample commit order
Regression test for:
- https://github.com/prometheus/prometheus/issues/14172
- https://github.com/prometheus/prometheus/issues/15177

Test cases are by @krajorama, taken from commit
b48bc9dc7e2ac553528763297cca73014357d542 .

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7
7e82bdb75b tsdb: Fix commit order for mixed-typed series
Fixes https://github.com/prometheus/prometheus/issues/15177

The basic idea here is to divide the samples to be commited into (sub)
batches whenever we detect that the same series receives a sample of a
type different from the previous one. We then commit those batches one
after another, and we log them to the WAL one after another, so that
we hit both birds with the same stone. The cost of the stone is that
we have to track the sample type of each series in a map. Given the
amount of things we already track in the appender, I hope that it
won't make a dent. Note that this even addresses the NHCB special case
in the WAL.

This does a few other things that I could not resist to pick up on the
go:

- It adds more zeropool.Pools and uses the existing ones more
  consistently. My understanding is that this was merely an oversight.
  Maybe the additional pool usage will compensate for the increased
  memory demand of the map.

- Create the synthetic zero sample for histograms a bit more
  carefully. So far, we created a sample that always went into its own
  chunk. Now we create a sample that is compatible enough with the
  following sample to go into the same chunk. This changed the test
  results quite a bit. But IMHO it makes much more sense now.

- Continuing past efforts, I changed more namings of `Samples` into
  `Floats` to keep things consistent and less confusing. (Histogram
  samples are also samples.) I still avoided changing names in other
  packages.

- I added a few shortcuts `h := a.head`, saving many characters.

TODOs:

- Address @krajorama's TODOs about commit order and staleness handling.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7
46cfc9fb99 tsdb: Extend TestDataNotAvailableAfterRollback
This exposes the ommission of float histograms from the rollback.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
machine424
8462515c75
test(storage/remote/queue_manager_test.go): use synctest in TestShutdown for better
control over time

The test becomes flaky after it was asked to run on parallel
and "fight" for resources

let's hide all of that

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-17 11:20:07 +02:00
Ayoub Mrini
7416f33df5
chore: define golangci-lint version in a single place and bump to v2.4.0 (#17202)
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-17 10:52:09 +02:00
machine424
5af40c2404
chore(workflows/check_release_notes): do not run on dependabot PRs and only run against main
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-17 09:35:59 +02:00
machine424
65b1cd5ae2
chore: prepare release 3.6.0
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-17 09:20:59 +02:00
György Krajcsovits
0cf54d7819
perf(otlp): reduce logs from OTLP endpoint
It's not possible to store created timestamp at the same timestamp as
the current sample, so do not even try.

In OpenTelemetry spec, if the start time is unknown, it will be set to
the same timestamp as the first sample.
https://opentelemetry.io/docs/specs/otel/metrics/data-model/#cumulative-streams-handling-unknown-start-time
This means that we will get a lot of duplicate sample for timestamp
errors and we should not log those.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-17 08:50:43 +02:00
George Krajcsovits
ccfda912e3
Merge pull request #17015 from Garbett1/update-fsnotify
chore: update fsnotify
2025-09-16 14:02:08 +02:00
Andrew Hall
aa922ce3b6
Added support for string literals and range results for instant queries in test scripting framework (#17055)
Signed-off-by: Andrew Hall <andrew.hall@grafana.com>
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-09-16 12:28:19 +01:00
Bryan Boreham
26279e5b6d
Merge pull request #17066 from cuiweixie/reflect.TypeFor-discovery
discovery: refactor to use reflect.TypeFor

Use a neater form, introduced in Go 1.22.
2025-09-16 12:22:14 +01:00
Bryan Boreham
0a3c64631c
Merge pull request #17195 from dancer1325/docs/fix_gettingstarted_outdated_graph_references
docs(): fix gettingStarted outdated graph reference
2025-09-16 12:11:35 +01:00
dancer1325
a14faab435 docs(): fix gettingStarted outdated graph reference
/graph does NOT exist anymore in the new React app. It has been refactored within /query

Signed-off-by: dancer1325 <alfredotic0809@gmail.com>
2025-09-15 17:31:18 +02:00
György Krajcsovits
bdf547ae9c
fix(nativehistograms): validation should fail on unsupported schemas
Histogram.Validate and FloatHistogram.Validate now return error on
unsupported schemas.

Scrape and remote-write handler reduces the schema to the maximum allowed
if it is above the maximum, but below theoretical maximum of 52.
For scrape the maximum is a configuration option, for remote-write it is 8.

Note: OTLP endpont already does the reduction, without checking that it is
below 52 as the spec does not specify a maximum.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-13 16:54:44 +02:00
Aditya Tiwari
1c974108f3 docs: fix typos and formatting in querying functions and storage
Signed-off-by: Aditya Tiwari <adityatiwari342005@gmail.com>
2025-09-11 19:22:58 +05:30
NamanParlecha
594f9d63a5
refactor(textparse): Introduce Variadic options in textParse.New (#17155)
* refactor(textparse): introduce ParserOptions struct for cleaner parser initialization

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(fuzz): update fuzzParseMetricWithContentType to use ParserOptions

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): simplify ParserOptions usage in tests and implementations

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parse): using variadic options

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): add fallbackType & SymbolTable to variadic options

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): private fields

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(scrape): compose parser options

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): add comments

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): update to use ParserOptions struct for configuration

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(scrape): remove unused parserOptions field from scrapeLoop

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

* refactor(parser): update ParserOptions field names and add comments for clarity

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>

---------

Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>
2025-09-11 10:49:42 +01:00
Ayoub Mrini
6ae5aaada9
Merge pull request #17168 from machine424/36rc1
chore: prepare release 3.6.0-rc.1
v0.306.0-rc.1 v3.6.0-rc.1
2025-09-10 15:30:52 +02:00
Björn Rabenstein
d7e9a2ffb0
Merge pull request #17141 from prometheus/beorn7/histogram3
promql: Use `HistogramStatsIterator` more often
2025-09-09 16:45:06 +02:00
beorn7
0fa70e0f6c promql: Use HistogramStatsIterator more often
The current code stops the walk after we have found the first relevant
function. However, in expressions with multiple legs, we will then use
the `HistogramStatsIterator` at most once. This change should make
sure we explore all legs.

The added tests make sure we are not using `HistogramStatsIterator`
where we shouldn't (but the opposite can only be seen in a benchmark
or with a more explicit test).

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-09 16:09:22 +02:00
beorn7
c84cf3622f promql: Add a two-legged benchmark for HistogramStatsIterator
Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-09 16:08:10 +02:00
Björn Rabenstein
fda99c6b35
Merge pull request #17127 from prometheus/beorn7/histogram2
Fix and optimize `HistogramStatsIterator` usage
2025-09-09 15:52:49 +02:00
machine424
dfb24f4ba0
chore: prepare release 3.6.0-rc.1
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-09 15:24:28 +02:00
Ayoub Mrini
6e06943e38
Merge pull request #17089 from prometheus/superq/stale_tracking
Pick #16925 into v3.6.0
2025-09-09 15:17:47 +02:00
beorn7
121de76cbb promqltest: Remove now needless 1* work-around
Prior to #17127, we needed to add another level in the AST to trigger
the usage of `HistogramStatsIterator`. This is fixed now.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-09 14:59:15 +02:00