16360 Commits

Author SHA1 Message Date
liangmulu
b1a7df2c0c chore: fix some minor issues in comments
Signed-off-by: liangmulu <liangmulu@outlook.com>
2025-07-09 18:05:41 +08:00
Kapil Lamba
df0e034314 address code review comments
Signed-off-by: Kapil Lamba <kapillamba4@gmail.com>
2025-07-09 07:25:31 +05:30
Björn Rabenstein
d8c921804e
Merge pull request #16824 from afhassan/main
tsdb: add count of histogram samples to block stats
2025-07-08 20:16:13 +02:00
Björn Rabenstein
dbee82267a
Merge pull request #16725 from MichaHoffmann/mhoffmann/fix-topk-nan-arg-error-on-nonexisting-series
promql: fix topk error on NaN argument for non-existing series
2025-07-08 19:42:20 +02:00
beorn7
bcf7a822a0 promql: Prevent extrapolation below zero for histogram count
This deals with the count field of native histograms in the same way
as with simple float counters. It then scale the whole histogram with
the same factor as it has scaled the count. This will still allow
individual buckets to get extrapolated below zero, but maybe that is
fine.

This implements approach (2) as described in
https://github.com/prometheus/prometheus/issues/15976#issuecomment-3032095158

Signed-off-by: beorn7 <beorn@grafana.com>
2025-07-08 19:01:31 +02:00
Vlad Shulcz
19fa1ed008
test(rulefmt): fix description annotation index in TestParseFileSuccessWithAliases (#16839)
Signed-off-by: shulcz <vshulcz@gmail.com>
2025-07-08 18:38:34 +02:00
Björn Rabenstein
c565e95808
Merge pull request #16825 from prometheus/beorn7/histogram
promql: add tests to demonstrate extrapolation below zero
2025-07-08 16:42:56 +02:00
chenlujjj
a2735494e1
chore: complete error message in RegisterSDMetrics function (#14635)
Signed-off-by: chenlujjj <953546398@qq.com>
2025-07-08 12:05:24 +00:00
Arthur Silva Sens
4b9d0fb92f
Revert: OTLP Support including scope metadata as metric labels (#16842)
Reverts #16730 and #16760

This is being done because we've noticed a problem in the spec that could
lead to name collisions if attributes name, version or schema_url are added
to the scope. They would collide with the already reserved labels
otel_scope_name, otel_scope_version and otel_scope_schema_url.

Since this new configuration option never made it into a release, we can
safely remove it from the 3.5 release. We'll sort this out for the 3.6 release

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2025-07-08 10:37:19 +00:00
Ahmed Hassan
01be7bfb2e add NumFloatSamples to TSDB block stats
Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
2025-07-07 13:48:18 -07:00
Lukasz Mierzwa
559fd44be6 Rename labels.go -> labels_slicelabels.go
labels.go is now holding slicelabels code, so let's rename it.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-07-07 12:37:42 +01:00
machine424
ffcba01c5a chore: do not hardcode required versions in README.md
add links to the sources of truth.

It's hard to keep up to date, the "go" one
is "wrong" (not really as an old 1.22 binray could still
download/use newer toolchains...) for example.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-07-07 08:42:31 +01:00
Charles Korn
1e58d792a5
storage/remote: fix "http: read on closed response body" errors if chunkedSeriesSet.Next is called again after the series set is exhausted (#16838)
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2025-07-07 09:23:34 +02:00
Michael Hoffmann
44ee5e2ad6 promql: fix topk error on NaN argument for non-existing series
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-07 06:19:39 +00:00
RaphSku
938e5cb62b
docs: Added documentation for promtool configuration with http.config.file (#16522)
Includes an example.

Signed-off-by: RaphSku <rapsku.dev@gmail.com>
2025-07-07 00:00:51 +02:00
beorn7
c0a13223e7 promql: add tests to demonstrate extrapolation below zero
This shows how float counters cannot go below zero when extrapolationg
for rate/increase, and how histograms do not have that protection yet,
leading to an overestimation of the rate/increase.

This also demonstrates edge cases where the count extrapolation does
not need to be limited, but an individual bucket still goes below
zero.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-07-06 23:42:55 +02:00
Michael Hoffmann
21b1536b5a
storage: add projection fields to select hints (#16423)
This commit adds Projection metadata to SelectHints so that downstream
storage implementations can use it to save effort when answering to
Select calls.

Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
2025-07-06 12:57:19 +02:00
Arve Knudsen
f561aa795d
OTLP receiver: Generate target_info samples between the earliest and latest samples per resource (#16737)
* OTLP receiver: Generate target_info samples between the earliest and latest samples per resource

Modify the OTLP receiver to generate target_info samples between the earliest
and latest samples per resource instead of only one for the latest timestamp.
The samples are spaced lookback delta/2 apart.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-07-04 14:38:16 +00:00
Jon Kartago Lamida
819500bdbc
Add ByteSize method for Labels (#16717)
Add `ByteSize()` method to different labels implementations.
One of the use case so that we can track the memory used by Labels.

Signed-off-by: Jon Kartago Lamida <me@lamida.net>
2025-07-04 15:09:01 +01:00
sujal shah
4408a6bcaf api: Create /status/tsdb/blocks endpoint.
this endpoint serves blocks data to the client.

Signed-off-by: sujal shah <sujalshah28092004@gmail.com>
2025-07-04 03:13:54 +05:30
machine424
c2d6e528e4
feat(discovery/kubernetes): allow attaching namespace metadata
to endpointslice, endpoints and pod roles

after injecting the labels for endpointslice, claude-4-sonnet
helped transpose the code and tests to endpoints and pod roles

fixes https://github.com/prometheus/prometheus/issues/9510
supersedes https://github.com/prometheus/prometheus/pull/13798

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Paul BARRIE <paul.barrie.calmels@gmail.com>
2025-07-03 19:41:08 +02:00
Arve Knudsen
5a5424cbc1
Consolidate around prometheus/common/model.ValidationScheme (#16806)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-07-03 15:37:46 +02:00
Bartlomiej Plotka
419d436a44
Merge pull request #16822 from prometheus/bump-otlptranslator
Bump otlptranslator to latest SHA
2025-07-03 12:40:31 +01:00
Matthias Loibl
61064cb774
Merge pull request #16819 from jscheffner/prometheus-dashboard-uid
mixin: add uid to prometheus overview dashboard
2025-07-03 11:16:05 +02:00
Julien
011c7fe87d
Merge pull request #16820 from prymitive/discoveryRace
discovery: fix a race in ApplyConfig while Prometheus is being stopped
2025-07-03 10:52:59 +02:00
dependabot[bot]
ce2e48f39e
build(deps): bump github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor
Bumps [github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib) from 0.128.0 to 0.129.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-collector-contrib/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CHANGELOG-API.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-collector-contrib/compare/v0.128.0...v0.129.0)

---
updated-dependencies:
- dependency-name: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor
  dependency-version: 0.129.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-03 08:10:56 +00:00
github-actions[bot]
3c25eb2a0d
Merge pull request #16815 from prometheus/dependabot/go_modules/github.com/oklog/run-1.2.0
build(deps): bump github.com/oklog/run from 1.1.0 to 1.2.0
2025-07-03 10:09:10 +02:00
Ahmed Hassan
6d77b47d13 add numHistogramSamples to block stats
Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
2025-07-02 19:52:04 -07:00
Arthur Silva Sens
0502f2d8fb
Bump otlptranslator to latest SHA
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2025-07-02 14:55:51 -03:00
Bryan Boreham
74aca682b7
Merge pull request #16807 from bboreham/test-sizeoflabels
[TESTS] Labels: Add a test for SizeOfLabels
2025-07-02 18:44:10 +01:00
Lukasz Mierzwa
b49d143595 Fix a race in discovery manager ApplyConfig & shutdown
If we call ApplyConfig() at the same time the manager is being stopped we might end up hanging forever.
This is because ApplyConfig() will try to cancel obsolete providers and wait until they are cancelled.
It's done by setting a done() function that call Done() on a sync.WaitGroup:

```
if len(prov.newSubs) == 0 {
	wg.Add(1)
	prov.done = func() {
		wg.Done()
	}
}
```

then calling prov.cancel() and finally waiting until all providers run done() function
that by blocking it all on a wg.Wait() call.

For each provider there is a goroutine created by calling Manager.startProvider(*Provider):

```
func (m *Manager) startProvider(ctx context.Context, p *Provider) {
	m.logger.Debug("Starting provider", "provider", p.name, "subs", fmt.Sprintf("%v", p.subs))
	ctx, cancel := context.WithCancel(ctx)
	updates := make(chan []*targetgroup.Group)

	p.mu.Lock()
	p.cancel = cancel
	p.mu.Unlock()

	go p.d.Run(ctx, updates)
	go m.updater(ctx, p, updates)
}
```

It creates a context that can be cancelled and that cancel function becomes prov.cancel. This is what ApplyConfig will call.
If we look at the body of updater() method:

```
func (m *Manager) updater(ctx context.Context, p *Provider, updates chan []*targetgroup.Group) {
	// Ensure targets from this provider are cleaned up.
	defer m.cleaner(p)
	for {
		select {
		case <-ctx.Done():
			return
[...]
```

we can see that it will exit if that context is cancelled and that will trigger a call to Manager.cleaner().
That cleaner() is where done() is called.
So ApplyConfig() -> calls cancel() -> causes cleaner() to be executed -> calls done().

cancel() is also called from cancelDiscoverers() method that will be called by Manager.Run() when Manager is stopping:

```
func (m *Manager) Run() error {
	go m.sender()
	<-m.ctx.Done()
	m.cancelDiscoverers()
	return m.ctx.Err()
}
```

The problem is that if we call both ApplyConfig and stop the manager at the same time we might end up with:

- We call Manager.ApplyConfig()
- We stop the Manager
- Manager.cancelDiscoverers() is called
- Provider.cancel() is called for every Provider
- cancel() causes provider context to be cancelled which terminates updater() for given Provider
- cancelling context causes cleaner() method to be called for given Provider
- cleaner() calls done() and exits
- Provider is considered stopped at this point, there is no goroutine running that will call done() anymore
- ApplyConfig iterates providers and decides that one is obsolete is must be stopped
- It sets a custom done() function body with a WaitGroup.Done() call in it
- Then ApplyConfig waits until all Providers run done()
- But they are all stopped and no done() will be run
- We wait forever

This only happens if cancelDiscoverers() is run before ApplyConfig, if ApplyConfig runs first done() will be called,
if cancelDiscoverers() is called first it will stop updater() instances and so done() won't be called anymore.

Part of the problem is that there is no distinction between running and stopped providers. There is Provider.IsStarted() method
that returns a bool based on the value of cancel function but ApplyConfig doesn't check it.
Second problem is that although there is a mutex on a Provider it's used much in the code, so two goroutines can try to read and/or write
provider.cancel and/or provider.done at the same time, making it all more likely to race.

The easiest way to fix it is to check if the provider is started inside ApplyConfig so we don't try to stop a provider that's already stopped.
For that we need to mark it as stopped after cancel() is called, by setting cancel to nil.
This also needs better lock usage to avoid different parts of the code trying to set cancel and done at the same time.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-07-02 16:03:10 +01:00
Lukasz Mierzwa
357e652044 Add a test for a rare shutdown hang
When doing a config reload that need to stop some providers while also sending SIGTERM to Prometheus at the same time can sometimes hang

1: sync.WaitGroup.Wait [83 minutes] [Created by run.(*Group).Run in goroutine 1 @ group.go:37]
    sync         sema.go:110              runtime_SemacquireWaitGroup(*uint32(#166))
    sync         waitgroup.go:118         (*WaitGroup).Wait(*WaitGroup(#23))
    discovery    manager.go:276           (*Manager).ApplyConfig(#23, #167)
    main         main.go:964              main.func5(#120)
    main         main.go:1505             reloadConfig({#183, 0x1b}, 1, #40, #43, #50, {#31, 0xa, 0})
    main         main.go:1182             main.func22()
    run          group.go:38              (*Group).Run.func1(*Group(#26), #51)

Add a test for it.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-07-02 16:01:42 +01:00
wmTJc9IK0Q
c481aaf762
codemirror-promql: Preserve source files in npm package (#16804)
* Preserve source files in codemirror-promql package

This allows for sourcemaps to work when the package is imported via ESM-native CDNs such as esm.sh

Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com>

* Preserve source files in lezer-promql package

Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com>

---------

Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com>
2025-07-02 15:31:02 +02:00
jscheffner
1be2deec88 mixin: add uid to prometheus overview dashboard
Signed-off-by: jscheffner <jscheffner@users.noreply.github.com>
2025-07-02 15:02:50 +02:00
Julien
f62d0e0385
Merge pull request #16777 from roidelapluie/add-step-promql
Add step(), min() and max() in promql duration expressions
2025-07-02 14:27:45 +02:00
Julien
432f130a32 PromQL: min/max/step: Address review comments
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
2025-07-02 11:17:36 +02:00
Julien Pivotto
984c8de0da PromQL: Fix printing +min()
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
2025-07-02 11:17:17 +02:00
Julien Pivotto
3af0bdee68 PromQL: min/max/step: add more tests
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
2025-07-02 11:17:17 +02:00
Julien Pivotto
ee7d5158a7 Add step(), min(a,b) and max(a,b) in promql duration expressions
step() is a new keyword introduced to represent the query step width in duration expressions.

min(a,b) and max(a,b) return the min and max from two duration expressions.

Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
2025-07-02 11:17:17 +02:00
Bryan Boreham
4eafbcae93 lint
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-07-02 09:56:28 +01:00
Bryan Boreham
e7ac3f440d [TESTS] Labels: Add a test for SizeOfLabels
This requires a bit of repetition to cover all the different builds, but
it seems worth checking that the function does what is expected.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-07-02 09:31:27 +01:00
Bryan Boreham
507227781b [REFACTOR] Labels: Extract test case data from TestLabels_String
So we can use them in other tests.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-07-02 09:31:25 +01:00
Julius Volz
bfbae39931
Merge pull request #16716 from charleskorn/charleskorn/binops-docs
docs: clarify and expand binary operations documentation
2025-07-02 10:02:17 +02:00
dependabot[bot]
f7372ec7d7
build(deps): bump github.com/docker/docker
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 28.2.2+incompatible to 28.3.0+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v28.2.2...v28.3.0)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-version: 28.3.0+incompatible
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-01 23:42:52 +00:00
dependabot[bot]
6bb7e088c5
build(deps): bump github.com/oklog/run from 1.1.0 to 1.2.0
Bumps [github.com/oklog/run](https://github.com/oklog/run) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/oklog/run/releases)
- [Commits](https://github.com/oklog/run/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: github.com/oklog/run
  dependency-version: 1.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-01 23:42:33 +00:00
dependabot[bot]
a92a5640c0
build(deps): bump google.golang.org/api from 0.238.0 to 0.239.0
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.238.0 to 0.239.0.
- [Release notes](https://github.com/googleapis/google-api-go-client/releases)
- [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md)
- [Commits](https://github.com/googleapis/google-api-go-client/compare/v0.238.0...v0.239.0)

---
updated-dependencies:
- dependency-name: google.golang.org/api
  dependency-version: 0.239.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-01 23:42:25 +00:00
dependabot[bot]
12c0ef6e0c
build(deps): bump github.com/linode/linodego from 1.52.1 to 1.52.2
Bumps [github.com/linode/linodego](https://github.com/linode/linodego) from 1.52.1 to 1.52.2.
- [Release notes](https://github.com/linode/linodego/releases)
- [Commits](https://github.com/linode/linodego/compare/v1.52.1...v1.52.2)

---
updated-dependencies:
- dependency-name: github.com/linode/linodego
  dependency-version: 1.52.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-01 23:41:41 +00:00
dependabot[bot]
5ca501e648
build(deps): bump github/codeql-action from 3.28.16 to 3.29.2
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3.28.16 to 3.29.2.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](28deaeda66...181d5eefc2)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.29.2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-01 23:20:21 +00:00
Lukasz Mierzwa
bb690a23b9 Make sure we never call trackStaleness with nil cache entry
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-07-01 14:22:01 +01:00
Lukasz Mierzwa
6687bf5653 Only add series to scrape cache if they were appended to TSDB
Scrape cache is used to emit StaleNaN markers after a series disappears so it should only hold entries for series that did end up in TSDB, which is not always the case due to sample_limit.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-07-01 14:22:01 +01:00