prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-12-04 00:51:02 +01:00

Author	SHA1	Message	Date
Nicolás Pazos	b43a07248f	tsdb tests: fix `mockIndex` implementation Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>	2025-07-10 15:59:38 -03:00
Owen Williams	d2f1f4fb27	config: Add UnderscoreEscapingWithoutSuffixes translation strategy (#16849 ) The last permutation of the translation options does underscore translation but does not add suffixes. This translation option already exists in Mimir as otel_metric_suffixes_enabled, indicating external demand for this strategy. There is an accompanying update to prometheus-docs to explain the use of this mode: https://github.com/prometheus/docs/pull/2688 Signed-off-by: Owen Williams <owen.williams@grafana.com>	2025-07-10 11:27:23 -04:00
Björn Rabenstein	b7f984d6d2	Merge pull request #16585 from kapillamba4/fix/16393-strict Convert PromQL tests to new syntax via basic migration mode	2025-07-10 15:45:38 +02:00
Björn Rabenstein	eb3ea163fa	promqltest: add tests for `histogram_count(increase(...))` (#16854 ) As `histogram_count` is playing tricks to improve performance, we better make sure that the limitation of extrapolation below zero still works as expected. Signed-off-by: beorn7 <beorn@grafana.com>	2025-07-10 15:44:02 +02:00
Charles Korn	8397b738bf	docs: clarify docs for PromQL aggregation operators (#16837 ) Signed-off-by: Charles Korn <charles.korn@grafana.com>	2025-07-10 15:34:57 +02:00
Björn Rabenstein	362141370d	Merge pull request #16828 from prometheus/beorn7/histogram2 promql(histograms): scale a histogram the same as the count	2025-07-10 13:26:15 +02:00
Björn Rabenstein	0672a5b045	Merge pull request #16847 from prometheus/beorn7/promql promqltest: Test NaN sample values for quantile aggregator	2025-07-10 11:16:12 +02:00
jingchanglu	9ddb21fccb	chore: fix some function names in comment Signed-off-by: jingchanglu <jingchanglu@outlook.com>	2025-07-10 14:43:25 +08:00
Dmitry Ponomaryov	b18272a572	Add template functions to support various use cases. (#16619 ) Presumably, this will help with Loki alerts, but the added functionality is also generally useful. For one, this enables `parseDuration` to also accept negative duration (as that's something that is also used in PromQL by now). This also adds a function `now` to return the evaluation time of the template (as seconds since epoch AKA Unix time) and a function `toDuration` (akin to `toTime`), which creates a Go `time.Duration` from a duration in seconds. --------- Signed-off-by: Dmitry Ponomaryov <me@halje.ru> Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>	2025-07-10 00:33:20 +02:00
machine424	846acc10bb	chore(tsdb): remove NewLeveledCompactorWithChunkSize constructor as unused, library users ca can redefine it on their side Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2025-07-09 17:10:13 +01:00
machine424	020e803ee0	chore(discovery): remove unused StaticProvider struct, library users can easily define it on their side Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2025-07-09 17:10:13 +01:00
George Krajcsovits	1d79f0f47e	chore(tsdb): add a few more testcases for unlock of unlocked mtx 16332 (#16848 ) Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-07-09 16:24:46 +02:00
Banana Duck	89f011ba13	fix: unlock of unlocked mutex (#16332 ) * fix: unlock on unlocked mutex Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru> * test coverage Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru> --------- Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru> Co-authored-by: alhanaqtah.usama <alhanaqtah.usama@DEV-254.local>	2025-07-09 15:37:55 +02:00
Björn Rabenstein	d86796863f	Merge pull request #16764 from bboreham/go-get-no-d [BUILD] Don't specify -d for go get	2025-07-09 14:14:05 +02:00
beorn7	107e4a00c3	promqltest: Test NaN sample values for quantile aggregator Signed-off-by: beorn7 <beorn@grafana.com>	2025-07-09 13:38:19 +02:00
Bryan Boreham	eea203702c	Prepare release 3.5.0-rc.1 (#16845 ) This RC reverts the feature "OTLP: Support promoting OTel scope attributes". Add the line back into the CHANGELOG for 3.5.0-rc.0, since we are not changing that version. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> v3.5.0-rc.1	2025-07-09 12:07:27 +01:00
Björn Rabenstein	181415c7b7	Merge pull request #16846 from liangmulu/main docs: fix some minor issues in comments	2025-07-09 13:00:13 +02:00
liangmulu	b1a7df2c0c	chore: fix some minor issues in comments Signed-off-by: liangmulu <liangmulu@outlook.com>	2025-07-09 18:05:41 +08:00
Kapil Lamba	df0e034314	address code review comments Signed-off-by: Kapil Lamba <kapillamba4@gmail.com>	2025-07-09 07:25:31 +05:30
Björn Rabenstein	d8c921804e	Merge pull request #16824 from afhassan/main tsdb: add count of histogram samples to block stats	2025-07-08 20:16:13 +02:00
Björn Rabenstein	dbee82267a	Merge pull request #16725 from MichaHoffmann/mhoffmann/fix-topk-nan-arg-error-on-nonexisting-series promql: fix topk error on NaN argument for non-existing series	2025-07-08 19:42:20 +02:00
beorn7	bcf7a822a0	promql: Prevent extrapolation below zero for histogram count This deals with the count field of native histograms in the same way as with simple float counters. It then scale the whole histogram with the same factor as it has scaled the count. This will still allow individual buckets to get extrapolated below zero, but maybe that is fine. This implements approach (2) as described in https://github.com/prometheus/prometheus/issues/15976#issuecomment-3032095158 Signed-off-by: beorn7 <beorn@grafana.com>	2025-07-08 19:01:31 +02:00
Vlad Shulcz	19fa1ed008	test(rulefmt): fix description annotation index in TestParseFileSuccessWithAliases (#16839 ) Signed-off-by: shulcz <vshulcz@gmail.com>	2025-07-08 18:38:34 +02:00
Björn Rabenstein	c565e95808	Merge pull request #16825 from prometheus/beorn7/histogram promql: add tests to demonstrate extrapolation below zero	2025-07-08 16:42:56 +02:00
chenlujjj	a2735494e1	chore: complete error message in RegisterSDMetrics function (#14635 ) Signed-off-by: chenlujjj <953546398@qq.com>	2025-07-08 12:05:24 +00:00
Arthur Silva Sens	4b9d0fb92f	Revert: OTLP Support including scope metadata as metric labels (#16842 ) Reverts #16730 and #16760 This is being done because we've noticed a problem in the spec that could lead to name collisions if attributes name, version or schema_url are added to the scope. They would collide with the already reserved labels otel_scope_name, otel_scope_version and otel_scope_schema_url. Since this new configuration option never made it into a release, we can safely remove it from the 3.5 release. We'll sort this out for the 3.6 release Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>	2025-07-08 10:37:19 +00:00
Ahmed Hassan	01be7bfb2e	add NumFloatSamples to TSDB block stats Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>	2025-07-07 13:48:18 -07:00
Lukasz Mierzwa	559fd44be6	Rename labels.go -> labels_slicelabels.go labels.go is now holding slicelabels code, so let's rename it. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>	2025-07-07 12:37:42 +01:00
machine424	ffcba01c5a	chore: do not hardcode required versions in README.md add links to the sources of truth. It's hard to keep up to date, the "go" one is "wrong" (not really as an old 1.22 binray could still download/use newer toolchains...) for example. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2025-07-07 08:42:31 +01:00
Charles Korn	1e58d792a5	storage/remote: fix "http: read on closed response body" errors if chunkedSeriesSet.Next is called again after the series set is exhausted (#16838 ) Signed-off-by: Charles Korn <charles.korn@grafana.com>	2025-07-07 09:23:34 +02:00
Michael Hoffmann	44ee5e2ad6	promql: fix topk error on NaN argument for non-existing series Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>	2025-07-07 06:19:39 +00:00
RaphSku	938e5cb62b	docs: Added documentation for promtool configuration with http.config.file (#16522 ) Includes an example. Signed-off-by: RaphSku <rapsku.dev@gmail.com>	2025-07-07 00:00:51 +02:00
beorn7	c0a13223e7	promql: add tests to demonstrate extrapolation below zero This shows how float counters cannot go below zero when extrapolationg for rate/increase, and how histograms do not have that protection yet, leading to an overestimation of the rate/increase. This also demonstrates edge cases where the count extrapolation does not need to be limited, but an individual bucket still goes below zero. Signed-off-by: beorn7 <beorn@grafana.com>	2025-07-06 23:42:55 +02:00
Michael Hoffmann	21b1536b5a	storage: add projection fields to select hints (#16423 ) This commit adds Projection metadata to SelectHints so that downstream storage implementations can use it to save effort when answering to Select calls. Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>	2025-07-06 12:57:19 +02:00
Arve Knudsen	f561aa795d	OTLP receiver: Generate `target_info` samples between the earliest and latest samples per resource (#16737 ) * OTLP receiver: Generate target_info samples between the earliest and latest samples per resource Modify the OTLP receiver to generate target_info samples between the earliest and latest samples per resource instead of only one for the latest timestamp. The samples are spaced lookback delta/2 apart. --------- Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2025-07-04 14:38:16 +00:00
Jon Kartago Lamida	819500bdbc	Add ByteSize method for Labels (#16717 ) Add `ByteSize()` method to different labels implementations. One of the use case so that we can track the memory used by Labels. Signed-off-by: Jon Kartago Lamida <me@lamida.net>	2025-07-04 15:09:01 +01:00
sujal shah	4408a6bcaf	api: Create `/status/tsdb/blocks` endpoint. this endpoint serves blocks data to the client. Signed-off-by: sujal shah <sujalshah28092004@gmail.com>	2025-07-04 03:13:54 +05:30
machine424	c2d6e528e4	feat(discovery/kubernetes): allow attaching namespace metadata to endpointslice, endpoints and pod roles after injecting the labels for endpointslice, claude-4-sonnet helped transpose the code and tests to endpoints and pod roles fixes https://github.com/prometheus/prometheus/issues/9510 supersedes https://github.com/prometheus/prometheus/pull/13798 Signed-off-by: machine424 <ayoubmrini424@gmail.com> Co-authored-by: Paul BARRIE <paul.barrie.calmels@gmail.com>	2025-07-03 19:41:08 +02:00
Arve Knudsen	5a5424cbc1	Consolidate around prometheus/common/model.ValidationScheme (#16806 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2025-07-03 15:37:46 +02:00
Bartlomiej Plotka	419d436a44	Merge pull request #16822 from prometheus/bump-otlptranslator Bump otlptranslator to latest SHA	2025-07-03 12:40:31 +01:00
Matthias Loibl	61064cb774	Merge pull request #16819 from jscheffner/prometheus-dashboard-uid mixin: add uid to prometheus overview dashboard	2025-07-03 11:16:05 +02:00
Julien	011c7fe87d	Merge pull request #16820 from prymitive/discoveryRace discovery: fix a race in ApplyConfig while Prometheus is being stopped	2025-07-03 10:52:59 +02:00
dependabot[bot]	ce2e48f39e	build(deps): bump github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor Bumps [github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib) from 0.128.0 to 0.129.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-collector-contrib/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CHANGELOG-API.md) - [Commits](https://github.com/open-telemetry/opentelemetry-collector-contrib/compare/v0.128.0...v0.129.0) --- updated-dependencies: - dependency-name: github.com/open-telemetry/opentelemetry-collector-contrib/processor/deltatocumulativeprocessor dependency-version: 0.129.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-03 08:10:56 +00:00
github-actions[bot]	3c25eb2a0d	Merge pull request #16815 from prometheus/dependabot/go_modules/github.com/oklog/run-1.2.0 build(deps): bump github.com/oklog/run from 1.1.0 to 1.2.0	2025-07-03 10:09:10 +02:00
Ahmed Hassan	6d77b47d13	add numHistogramSamples to block stats Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>	2025-07-02 19:52:04 -07:00
Arthur Silva Sens	0502f2d8fb	Bump otlptranslator to latest SHA Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>	2025-07-02 14:55:51 -03:00
Bryan Boreham	74aca682b7	Merge pull request #16807 from bboreham/test-sizeoflabels [TESTS] Labels: Add a test for SizeOfLabels	2025-07-02 18:44:10 +01:00
Lukasz Mierzwa	b49d143595	Fix a race in discovery manager ApplyConfig & shutdown If we call ApplyConfig() at the same time the manager is being stopped we might end up hanging forever. This is because ApplyConfig() will try to cancel obsolete providers and wait until they are cancelled. It's done by setting a done() function that call Done() on a sync.WaitGroup: ``` if len(prov.newSubs) == 0 { wg.Add(1) prov.done = func() { wg.Done() } } ``` then calling prov.cancel() and finally waiting until all providers run done() function that by blocking it all on a wg.Wait() call. For each provider there is a goroutine created by calling Manager.startProvider(Provider): ``` func (m Manager) startProvider(ctx context.Context, p Provider) { m.logger.Debug("Starting provider", "provider", p.name, "subs", fmt.Sprintf("%v", p.subs)) ctx, cancel := context.WithCancel(ctx) updates := make(chan []targetgroup.Group) p.mu.Lock() p.cancel = cancel p.mu.Unlock() go p.d.Run(ctx, updates) go m.updater(ctx, p, updates) } ``` It creates a context that can be cancelled and that cancel function becomes prov.cancel. This is what ApplyConfig will call. If we look at the body of updater() method: ``` func (m Manager) updater(ctx context.Context, p Provider, updates chan []targetgroup.Group) { // Ensure targets from this provider are cleaned up. defer m.cleaner(p) for { select { case <-ctx.Done(): return [...] ``` we can see that it will exit if that context is cancelled and that will trigger a call to Manager.cleaner(). That cleaner() is where done() is called. So ApplyConfig() -> calls cancel() -> causes cleaner() to be executed -> calls done(). cancel() is also called from cancelDiscoverers() method that will be called by Manager.Run() when Manager is stopping: ``` func (m Manager) Run() error { go m.sender() <-m.ctx.Done() m.cancelDiscoverers() return m.ctx.Err() } ``` The problem is that if we call both ApplyConfig and stop the manager at the same time we might end up with: - We call Manager.ApplyConfig() - We stop the Manager - Manager.cancelDiscoverers() is called - Provider.cancel() is called for every Provider - cancel() causes provider context to be cancelled which terminates updater() for given Provider - cancelling context causes cleaner() method to be called for given Provider - cleaner() calls done() and exits - Provider is considered stopped at this point, there is no goroutine running that will call done() anymore - ApplyConfig iterates providers and decides that one is obsolete is must be stopped - It sets a custom done() function body with a WaitGroup.Done() call in it - Then ApplyConfig waits until all Providers run done() - But they are all stopped and no done() will be run - We wait forever This only happens if cancelDiscoverers() is run before ApplyConfig, if ApplyConfig runs first done() will be called, if cancelDiscoverers() is called first it will stop updater() instances and so done() won't be called anymore. Part of the problem is that there is no distinction between running and stopped providers. There is Provider.IsStarted() method that returns a bool based on the value of cancel function but ApplyConfig doesn't check it. Second problem is that although there is a mutex on a Provider it's used much in the code, so two goroutines can try to read and/or write provider.cancel and/or provider.done at the same time, making it all more likely to race. The easiest way to fix it is to check if the provider is started inside ApplyConfig so we don't try to stop a provider that's already stopped. For that we need to mark it as stopped after cancel() is called, by setting cancel to nil. This also needs better lock usage to avoid different parts of the code trying to set cancel and done at the same time. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>	2025-07-02 16:03:10 +01:00
Lukasz Mierzwa	357e652044	Add a test for a rare shutdown hang When doing a config reload that need to stop some providers while also sending SIGTERM to Prometheus at the same time can sometimes hang 1: sync.WaitGroup.Wait [83 minutes] [Created by run.(Group).Run in goroutine 1 @ group.go:37] sync sema.go:110 runtime_SemacquireWaitGroup(uint32(#166)) sync waitgroup.go:118 (WaitGroup).Wait(WaitGroup(#23)) discovery manager.go:276 (Manager).ApplyConfig(#23, #167) main main.go:964 main.func5(#120) main main.go:1505 reloadConfig({#183, 0x1b}, 1, #40, #43, #50, {#31, 0xa, 0}) main main.go:1182 main.func22() run group.go:38 (Group).Run.func1(*Group(#26), #51) Add a test for it. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>	2025-07-02 16:01:42 +01:00
wmTJc9IK0Q	c481aaf762	codemirror-promql: Preserve source files in npm package (#16804 ) * Preserve source files in codemirror-promql package This allows for sourcemaps to work when the package is imported via ESM-native CDNs such as esm.sh Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com> * Preserve source files in lezer-promql package Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com> --------- Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com>	2025-07-02 15:31:02 +02:00

... 5 6 7 8 9 ...

16177 Commits