16177 Commits

Author SHA1 Message Date
Bryan Boreham
aa12c0d4c3
Merge pull request #17074 from prymitive/logs
TSDB: Log when GC / block write starts
2025-09-02 12:55:12 +01:00
Daniel Gospodinow
562d13e930
docs: minor grammar improvements in basics.md (#17077)
Corrected minor grammatical errors in the documentation.

Signed-off-by: Daniel Gospodinow <danielgospodinow@gmail.com>
2025-09-02 12:44:14 +01:00
Bryan Boreham
8e133e100f
Merge pull request #17081 from prometheus/superq/if_err_nil
tsdb: Fixup err nil checks
2025-09-02 12:37:51 +01:00
George Krajcsovits
d09db02854
fix(nhcb): flaky test TestConvertClassicHistogramsToNHCB (#17112)
* fix(nhcb): flaky test TestConvertClassicHistogramsToNHCB

The test was e2e, including actually scraping an HTTP endpoint and running
the scrape loop. This led to some timing issues.

I've simplified it to call the scrape loop append directly. I think that
this isn't nice as that is a private interface, but should gets rid of the
flakiness and there's already a bunch of test doing this.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-02 13:37:14 +02:00
Bryan Boreham
70bf09cb2b
Merge pull request #16429 from prymitive/scrapeCacheStaleNaN
Append staleness markers only for known series
2025-09-02 10:41:07 +01:00
Duciwuci
070ffd7edb bump go version across all stages
Signed-off-by: Duciwuci <duciwuci@gmail.com>
2025-09-02 10:02:39 +02:00
Duciwuci
739791a285 update script for internal and web
Signed-off-by: Duciwuci <duciwuci@gmail.com>
2025-09-02 10:02:01 +02:00
Craig Ringer
30bf18f968 test: Add additional tests for mixed float/histogram series
Add further tests for first_over_time (also covering existing
last_over_time, count_over_time, etc) to exercise vectors
containing a mix of float and histogram samples where the
histogram samples do not come last in the series.

This tripped over https://github.com/prometheus/prometheus/issues/17025
so it's structured a bit oddly to work around that bug in the
appender as used by promtest.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
2025-09-02 10:24:37 +12:00
Craig Ringer
1ce84d8e2f feat(promql): add first_over_time and ts_of_first_over_time
Add a first_over_time function, and corresponding ts_of_first_over_time
function.  Both are behind the experimental functions feature flag.

Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
2025-09-02 10:24:31 +12:00
Ayoub Mrini
c5743037ef
Merge pull request #17065 from machine424/queuelevel
storage/remote: compute highestTimestamp and dataIn at QueueManager level
2025-09-01 20:12:19 +02:00
machine424
ba14bc49db
chore: deprecate prometheus_remote_storage_{samples,exemplars,histograms}_in_total and prometheus_remote_storage_highest_timestamp_in_seconds
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-01 13:19:28 +02:00
machine424
184c7eb918
storage/remote: compute highestTimestamp and dataIn at QueueManager level
Because of relabelling, an endpoint can only select a subset of series
that go through WriteStorage

Having a highestTimestamp at WriteStorage level yields wrong values
if the corresponding sample won't even make it to a remote queue.

Currently PrometheusRemoteWriteBehind is based on that, and would fire
if an endpoint is only interested in a subset of series that take time
to appear.

A "prometheus_remote_storage_queue_highest_timestamp_seconds" that only
takes into account samples in the queue is introduced, and used in
PrometheusRemoteWriteBehind and dashboards in documentation/prometheus-mixin

Same applies to samplesIn/dataIn, QueueManager should know more about
when to update those; when data is enqueued.

That makes dataDropped unnecessary, thus help simplify the logic
in QueueManager.calculateDesiredShards()

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-09-01 13:19:24 +02:00
Ayoub Mrini
2cbeef6d95
Merge pull request #17090 from machine424/bump_go
chore: add a "make bump-go-version" to handle Go bumps across all files
2025-09-01 13:05:29 +02:00
Bartlomiej Plotka
18626a99c4
fix(rules.Manager): ensure non-nil context (#17103)
Saw some panic on main due to lack of defaulting:
https://github.com/prometheus/prometheus/actions/runs/17317373582/job/49162760911

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-08-29 10:43:59 +01:00
bwplotka
172cde8af1 Revert "feat(storage): add new CombinedAppender interface and compatibility layer"
This reverts commit 2fb680a229ee907e71e332066ed41f84f7714319.
2025-08-29 08:16:39 +01:00
bwplotka
794bf774c2 Reapply "prw: use Unit and Type labels for metadata when feature flag is enabled (#17033)"
This reverts commit f5fab4757733746a708e7b80324b8929c1b84856.
2025-08-29 08:16:37 +01:00
bwplotka
f5fab47577 Revert "prw: use Unit and Type labels for metadata when feature flag is enabled (#17033)"
This reverts commit c808a71e18d4d1cc91e1d06859ebeae818465324.
2025-08-29 08:15:28 +01:00
bwplotka
2fb680a229 feat(storage): add new CombinedAppender interface and compatibility layer
Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-08-29 08:14:34 +01:00
Jonathan
c808a71e18
prw: use Unit and Type labels for metadata when feature flag is enabled (#17033)
* chore: send Unit and Type when feature flag is enabled

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused code and comments

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unreal scenario

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused if

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused labels

Signed-off-by: perebaj <perebaj@gmail.com>

* linter

Signed-off-by: perebaj <perebaj@gmail.com>

* enable type and unit through remotewrite config

Signed-off-by: perebaj <perebaj@gmail.com>

* remove test comment and capture type and unit when flag is enabled

Signed-off-by: perebaj <perebaj@gmail.com>

* gofumpt

Signed-off-by: perebaj <perebaj@gmail.com>

* modelTypeToWriteV2Type

Signed-off-by: perebaj <perebaj@gmail.com>

* use NewMetadataFromLabels

Signed-off-by: perebaj <perebaj@gmail.com>

* capture feature flag from main

Signed-off-by: perebaj <perebaj@gmail.com>

* simplifying logic

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused function

Signed-off-by: perebaj <perebaj@gmail.com>

* formatting code

Signed-off-by: perebaj <perebaj@gmail.com>

* gofumpt

Signed-off-by: perebaj <perebaj@gmail.com>

* remove public var: EnableTypeAndUnitLabels

Signed-off-by: perebaj <perebaj@gmail.com>

* remove enableTypeAndUnitLabels from TestPopulateV2TimeSeries_typeAndUnitLabels

Signed-off-by: perebaj <perebaj@gmail.com>

* remove enableTypeAndUnitLabels from main

Signed-off-by: perebaj <perebaj@gmail.com>

* use schema helper to populate metadata

Signed-off-by: perebaj <perebaj@gmail.com>

* remove metadata since nil is the default value

Signed-off-by: perebaj <perebaj@gmail.com>

* add TestPopulateV2TimeSeries_UnexpectedMetadata

Signed-off-by: perebaj <perebaj@gmail.com>

* Update storage/remote/queue_manager_test.go

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

---------

Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-08-29 04:10:01 +00:00
Björn Rabenstein
ba808d1736
Merge pull request #17092 from prometheus/beorn7/cleanup
Apply analyzer modernize to the whole codebase
2025-08-28 00:42:33 +02:00
Bryan Boreham
2fb50b12cd
[PERF] TSDB: Optimize appender creation on empty chunks (#16922)
Skip creating an iterator and walking all through any existing values,
when we can easily tell there are no existing values.

This is the normal case - the TSDB head creates an appender immediately
after creating every chunk.

Remove redundant handling of empty chunks.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-27 17:11:08 +01:00
Bryan Boreham
4a782634a4
Merge pull request #17093 from prometheus/beorn7/histogram
tsdb: Remove unused `Layout()` methods
2025-08-27 16:26:43 +01:00
beorn7
23f1d3ba25 tsdb: Remove unused Layout() methods
Both `HistogramChunk` and `FloatHistogramChunk` have a `Layout()`
method for historical reasons. As it has turned out, these methods are
unused and also buggy. This commit simply removes them.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 17:01:23 +02:00
beorn7
71c21fb9e4 Fix minor issues after applying analyzer "modernize"
- The tool left an empty line behind that we don't need anymore, see
  https://github.com/prometheus/prometheus/pull/17092. (Arguably not a
  bug in the tool but just our stricter style about empty lines.)

- In tsdb/index/postings_test.go , our (admittedly somewhat
  convoluted) code structure tricked the tool so it spit out something
  that wouldn't even compile.

- storage/remote/queue_manager_test.go is just a minor formatting
  nit.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 15:44:11 +02:00
beorn7
747c5ee2b1 Apply analyzer "modernize" to the whole codebase
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.

This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.

Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 14:48:41 +02:00
Ayoub Mrini
9cbb3a66c9
Merge pull request #17063 from machine424/muttt
test(notifier): add a test showing an alert mutation bug between alertmanager_config and fix it
2025-08-27 11:13:00 +02:00
pipiland2612
0246aa22f4 Parralel test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-27 10:47:39 +02:00
Darkknight
9fc4212214
revert unexpected metadata metric fopr RWV2 and add log on unexpected metadata instead. (#17082)
Signed-off-by: leegin <leegin.t@gmail.com>
2025-08-26 11:54:14 -07:00
SuperQ
b1802bae0c
Add changlog entry
Add changelog entry to pull in #16925 to v3.6.0.

Signed-off-by: SuperQ <superq@gmail.com>
2025-08-26 18:16:42 +02:00
Ganesh Vernekar
b98cc631a2
Restore stale series count from chunk snapshots
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-26 15:45:53 +02:00
Ganesh Vernekar
c3789ff547
Restore stale series count on WAL replay
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-26 15:45:45 +02:00
Ganesh Vernekar
787fe92e86
Test the stale series tracking in Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-26 15:45:39 +02:00
Ganesh Vernekar
4a37fd886f
Track stale series in the Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-26 15:45:32 +02:00
Lukasz Mierzwa
31282d67b7 Log when GC / block write starts
Right now Prometheus only logs when these operations are completed.
It's a bit surprising to see suddenly a message saying "I was busy doing X for the past N minutes"
so let's add a message when the operation starts, so it's easier to understand what Prometheus was doing at any point in time
when reading logs.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-08-26 10:30:22 +01:00
bragi92
20580b6ba8
remote_write azure auth : add workload identity support (#16788)
* initial changes

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* .

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* fix comments

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* fix tenantid test

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* style

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* pr feedback

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

---------

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-08-26 07:14:47 +01:00
machine424
8f79470ca9
fix(notifier): create a new alert when relabeling alters labels
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-26 07:37:45 +02:00
SuperQ
b87cbf0294
Fixup err nil checks
Cleanup double `if` statements for errors being nil / not-nil.

Signed-off-by: SuperQ <superq@gmail.com>
2025-08-25 17:37:02 +02:00
machine424
bd725fd6b8
test(notifier): add a test showing an alert mutation bug between alertmanager_config (alertmanagersets)
The alert_relabel_configs should only apply to the corresponding alertmanagerset

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-25 17:04:14 +02:00
Darkknight
7cf585527f
remote_write: add metric for unexpected metadata in populateV2TimeSeries (#17034)
add metric to track unexpected metadata seen in populateV2TimeSeries, which would indicate metadata incorrectly routed in queue_manager code paths

---------

Signed-off-by: leegin <leegin.t@gmail.com>
Signed-off-by: Darkknight <leegin.t@gmail.com>
2025-08-22 10:33:52 -07:00
Bryan Boreham
153cdb2b0b [PERF] PromQL: Replace Fprintf %f with AppendFloat
The combination of `AvailableBuffer`` followed by `Write` is optimised
inside `bytes.Buffer`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-22 10:58:01 +01:00
Minh Nguyen
c8deefb038
[tsdb] Add CounterResetHint: CounterReset to synthetic zero sample (#17011)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-21 23:26:01 +02:00
Marco Pracucci
954cad35b2
Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE (#17064)
* Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Further optimised the case of ALERTS and ALERTS_FOR_STATE without alertname label matcher

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2025-08-21 16:57:57 +02:00
machine424
9855613435 fix PR number
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
v0.306.0-rc.0 v3.6.0-rc.0
2025-08-21 15:28:03 +02:00
machine424
94b4c49a76 apply bboreham's suggestions
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-21 15:28:03 +02:00
machine424
157ed00d9d chore: prepare release 3.6.0-rc.0
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-21 15:28:03 +02:00
Bryan Boreham
b8d2d505f5 [PERF] PromQL: Replace some Sprintf with bytes.Buffer
Goes faster due to reduced memory allocation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:38:05 +01:00
Bryan Boreham
49d9261693 [PERF] PromQL: Replace some simple Sprintf with string concat
This goes faster because there is no runtime format parsing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:05:49 +01:00
Bryan Boreham
e44ee2f182 [TESTS] PromQL: Add BenchmarkExprString
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:05:38 +01:00
Bryan Boreham
66fbea97bb [TESTS] Check expr with function call in TestExprString
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:04:46 +01:00
Bryan Boreham
8b3f59e9c3
Merge pull request #16593 from bboreham/ast-child-iter
[PERF] PromQL: Reduce allocations when walking syntax tree
2025-08-21 09:14:41 +01:00