prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2026-01-08 10:11:03 +01:00

Author	SHA1	Message	Date
bwplotka	efdfb8fed6	refactor(appenderV2): 1:1 copy of head append test files for v2 (starting point) Signed-off-by: bwplotka <bwplotka@gmail.com>	2025-12-09 09:53:39 +00:00
bwplotka	129650df9d	refactor(appenderV2): 1:1 copy of head_append.go -> head_append_v2.go (starting point) Signed-off-by: bwplotka <bwplotka@gmail.com>	2025-12-09 09:53:39 +00:00
Bartlomiej Plotka	f6ca7145ca	refactor(tsdb): use one test newTestDB constructor (#17638 ) For tests only, we had various ways of opening DB. Reduced to one instead of: * Open * newTestDB * newTestDBOpts * openTestDB This so https://github.com/prometheus/prometheus/pull/17629 is smaller and bit easier. Also for test maintainability and consistency. Signed-off-by: bwplotka <bwplotka@gmail.com>	2025-12-03 07:55:48 +00:00
Łukasz Mierzwa	8a1086a128	feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file (#17435 ) * Delay compactions until Thanos uploads all blocks Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled. The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com> * Rename structs and functions to make this more generic Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com> * Address review comments Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com> * Cache UploadMeta for 1 minute Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com> --------- Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>	2025-12-02 10:39:45 +00:00
Björn Rabenstein	b8d19543b8	Add histogram validation in remote-read and during reducing resolution (#17561 ) ReduceResolution is currently called before validation during ingestion. This will cause a panic if there are not enough buckets in the histogram. If there are too many buckets, the spurious buckets are ignored, and therefore the error in the input histogram is masked. Furthermore, invalid negative offsets might cause problems, too. Therefore, we need to do some minimal validation in reduceResolution. Fortunately, it is easy and shouldn't slow things down. Sadly, it requires to return errors, which triggers a bunch of code changes. Even here is a bright side, we can get rud of a few panics. (Remember: Don't panic!) In different news, we haven't done a full validation of histograms read via remote-read. This is not so much a security concern (as you can throw off Prometheus easily by feeding it bogus data via remote-read) but more that remote-read sources might be makeshift and could accidentally create invalid histograms. We really don't want to panic in that case. So this commit does not only add a check of the spans and buckets as needed for resolution reduction but also a full validation during remote-read. Signed-off-by: beorn7 <beorn@grafana.com>	2025-11-21 00:22:24 +01:00
0xkato	ae00fd45ab	tsdb: guard chunk length overflow in head chunk reader (#17533 ) Signed-off-by: 0xkato <0xkkato@gmail.com>	2025-11-15 21:09:00 +01:00
Bryan Boreham	1240402620	Merge pull request #17439 from bboreham/faster-postings tsdb: couple of postings optimizations	2025-11-14 18:36:34 +01:00
Bartlomiej Plotka	f50ff0a40a	feat: rename CreatedTimestamp to StartTimestamp (#17523 ) Partially fixes https://github.com/prometheus/prometheus/issues/17416 by renaming all CT* names to ST* in the whole codebase except RW2 (this is done in separate [PR](https://github.com/prometheus/prometheus/pull/17411)) and PrometheusProto exposition proto. ``` CreatedTimestamp -> StartTimestamp CreatedTimeStamp -> StartTimestamp created_timestamp -> start_timestamp CT -> ST ct -> st ``` Signed-off-by: bwplotka <bwplotka@gmail.com>	2025-11-13 14:17:51 +00:00
Ben Ye	2e609511bb	Register missing metric prometheus_tsdb_sample_ooo_delta (#17477 ) * register missing metric prometheus_tsdb_sample_ooo_delta Signed-off-by: yeya24 <benye@amazon.com> * changelog Signed-off-by: yeya24 <benye@amazon.com> --------- Signed-off-by: yeya24 <benye@amazon.com>	2025-11-11 11:07:08 +01:00
Bryan Boreham	c1e0ab11c6	[PERF] TSDB: Speed up intersectPostings.Next Check if the next position is already a match, in which case we don't have to call `Seek`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-11-10 17:00:19 +00:00
Bryan Boreham	0e1e7441e4	[PERF] TSDB: ListPostings: check next item before binary search It is fairly common that the next item is the one we want, and cheap to check. We could also start the binary search one position on, but strangely that slows it down. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-11-10 17:00:19 +00:00
Bryan Boreham	be8307db58	[TEST] Refactor BenchmarkIntersect to reduce memory allocations Extract functions which pre-create all the memory for the benchmark itself. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-11-10 17:00:15 +00:00
Bryan Boreham	393ab9e12e	[TEST] TSDB: More realistic BenchmarkIntersect 100,000 matchers is not something that could happen while using Prometheus. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-11-10 16:58:14 +00:00
Jan Fajerski	49254f45e9	Merge pull request #17351 from bboreham/simplify-precreate TSDB: Allocate series ID after seriesLifecycleCallback; simplify code.	2025-11-07 14:39:51 +01:00
Ben Kochie	204249fcb5	Update golangci-lint (#17478 ) * Update golangci-lint to v2.6.0 * Fixup various linting issues. * Fixup deprecations. * Add exception for `labels.MetricName` deprecation. Signed-off-by: SuperQ <superq@gmail.com>	2025-11-05 13:47:34 +01:00
Ben Kochie	48956f60d7	Update modernize (#17471 ) Apply additional Go modernize tool improvements. Signed-off-by: SuperQ <superq@gmail.com>	2025-11-04 05:13:49 +00:00
geogrego	58dbe927d5	docs: minor improvement for docs Signed-off-by: geogrego <geogrego@outlook.com>	2025-10-29 14:42:14 +08:00
Fiona Liao	b004db49af	Reduce samples for TestRuntimeRetentionConfigChange (#17422 ) * Reduce samples for TestRuntimeRetentionConfigChange --------- Signed-off-by: Fiona Liao <fiona.liao@grafana.com>	2025-10-28 18:23:32 +01:00
Arve Knudsen	df8a9076b9	tsdb: Reduce TestHeadSeriesChunkRace number of iterations to 100 (#17410 ) Reduce tsdb.TestHeadSeriesChunkRace number of iterations from 1000 to 100, to stop this test from timing out under CI. Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2025-10-28 13:57:20 +01:00
Minh Nguyen	ad4b59c504	tsdb: Deprecate retention flags; add tsdb.retention runtime configuration (#17026 ) * Move storage from CL to config file Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * Fix .md Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * run make cli-documentation Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * fix Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * run make cli-documentation Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * nit_fixed Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * fix Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * add test and update configuration.md Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * fix lint Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> --------- Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>	2025-10-27 14:51:33 +00:00
György Krajcsovits	18efd9d629	feat(ui): mark native histograms as stable in ui strings Plus some docstrings Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-10-24 12:32:15 +02:00
Ayoub Mrini	504587c724	chore(direct_io): fix constructor's name (#17371 ) Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2025-10-23 11:35:16 +02:00
Bryan Boreham	42b52ecc4b	TSDB: Allocate series ID after seriesLifecycleCallback This callback is not used by Prometheus, but in downstream projects it is wasteful to allocate an ID only to abandon it. Remove lengthy commment which I feel is distracting from the flow. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-10-17 11:06:22 +01:00
Bryan Boreham	2852c9c431	[REFACTOR] TSDB: Simplify series creation Refactor the code so that everything proceeds linearly. Also renamed `getOrSet` to `setUnlessAlreadySet` to emphasise that the caller is expecting it not to be set. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-10-17 10:46:22 +01:00
beorn7	ad7d1aed99	Phase out native histogram feature flag The detailed plan for this is laid out in https://github.com/prometheus/prometheus/issues/16572 . This commit adds a global and local scrape config option `scrape_native_histograms`, which has to be set to true to ingest native histograms. To ease the transition, the feature flag is changed to simply set the default of `scrape_native_histograms` to true. Further implications: - The default scrape protocols now depend on the `scrape_native_histograms` setting. - Everywhere else, histograms are now "on by default". Documentation beyond the one for the feature flag and the scrape config are deliberately left out. See https://github.com/prometheus/prometheus/pull/17232 for that. Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-15 14:50:52 +02:00
Björn Rabenstein	1caac94026	Merge pull request #17302 from prometheus/release-3.7 Merge release-3.7 back into main.	2025-10-07 18:40:08 +02:00
beorn7	e2aed2cd27	tsdb: Disable more tests on MS Windows Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-07 16:34:59 +02:00
Björn Rabenstein	68e4d4e5eb	Merge pull request #17298 from prometheus/release-3.7 Merging back release-3.7 branch into master	2025-10-07 16:23:36 +02:00
Björn Rabenstein	f2fc492473	Merge pull request #17284 from linasm/custom-bucket-bounds-match-fn NHCB: Separate CustomBucketBoundsMatch from FloatBucketsMatch	2025-10-07 15:38:59 +02:00
beorn7	51c8e55835	tsdb: Do not track stFloat in typesInBatch explicitly Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-07 15:01:22 +02:00
beorn7	5f582a7e1f	tsdb: Remove leftover debug fmt.Println Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-07 14:58:25 +02:00
Bartlomiej Plotka	a4da440dad	fix: Fix slicelabels corruption when used with proto decoding (#17150 ) * fix: Fix slicelabels corruption when used with proto decoding Alternative to https://github.com/prometheus/prometheus/pull/16957/ Signed-off-by: bwplotka <bwplotka@gmail.com> * addressed comments Signed-off-by: bwplotka <bwplotka@gmail.com> --------- Signed-off-by: bwplotka <bwplotka@gmail.com>	2025-10-07 12:06:48 +01:00
György Krajcsovits	d11ee103ac	perf(tsdb): reuse map of sample types to speed up head appender While investigating +10% CPU in v3.7 release, found that ~5% is from expanding the types map. Try reuse. Also fix a linter error. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-10-07 08:31:20 +02:00
György Krajcsovits	c26a5390aa	perf(tsdb): reuse map of sample types to speed up head appender While investigating +10% CPU in v3.7 release, found that ~5% is from expanding the types map. Try reuse. Also fix a linter error. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-10-06 21:44:34 +02:00
Linas Medziunas	8caf1f1c41	[NHCB] Separate CustomBucketBoundsMatch from floatBucketsMatch Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>	2025-10-05 22:38:07 +03:00
Patryk Prus	dc3e6af91a	tsdb: Fix appended sample count metrics when converting float staleness markers to histograms (#17241 ) tsdb: Fix appended sample count metrics when converting histogram staleness markers Signed-off-by: Patryk Prus <p@trykpr.us> Signed-off-by: Björn Rabenstein <github@rabenste.in> Co-authored-by: Björn Rabenstein <github@rabenste.in>	2025-09-30 16:49:54 +00:00
George Krajcsovits	35d9f28c87	Update tsdb/record/record.go Co-authored-by: Björn Rabenstein <beorn@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>	2025-09-24 14:27:37 +02:00
György Krajcsovits	30f941c57c	fix(wal): ignore invalid native histogram schemas on load Reduce the resolution of histograms as needed and ignore invalid schemas while emitting a warning log. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-24 11:41:25 +02:00
György Krajcsovits	a5a6413c1a	better errors naming and formatting, typo fixes Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-23 11:20:55 +02:00
György Krajcsovits	b6df8d3274	feat(chunkenc): allow more native histograms schemas Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if above. The reduce code path will be slow, but we only expect it to happen if TSDB already has higher resolution samples and we are in a rollback. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> # Conflicts: # model/histogram/generic.go	2025-09-23 11:20:48 +02:00
György Krajcsovits	5b39b79f5a	refactor error creation and tests Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-19 09:26:34 +02:00
György Krajcsovits	b99378f2c4	Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation	2025-09-19 08:59:00 +02:00
György Krajcsovits	267be7dc20	fix(chunkenc): error out when reading unknown histogram schemas from chunks Otherwise higher level code like PromQL needs to constantly check if it can handle the samples. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-18 09:21:03 +02:00
beorn7	bd0bf66f31	tsdb: Include floatHistograms in headAppender.Rollback() Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	b1fbf4f1e2	tsdb: Refactor staleness marker handling With the fixed commit order, we can now handle the conversion of float staleness markers to histogram staleness markers in a more direct way. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	7e82bdb75b	tsdb: Fix commit order for mixed-typed series Fixes https://github.com/prometheus/prometheus/issues/15177 The basic idea here is to divide the samples to be commited into (sub) batches whenever we detect that the same series receives a sample of a type different from the previous one. We then commit those batches one after another, and we log them to the WAL one after another, so that we hit both birds with the same stone. The cost of the stone is that we have to track the sample type of each series in a map. Given the amount of things we already track in the appender, I hope that it won't make a dent. Note that this even addresses the NHCB special case in the WAL. This does a few other things that I could not resist to pick up on the go: - It adds more zeropool.Pools and uses the existing ones more consistently. My understanding is that this was merely an oversight. Maybe the additional pool usage will compensate for the increased memory demand of the map. - Create the synthetic zero sample for histograms a bit more carefully. So far, we created a sample that always went into its own chunk. Now we create a sample that is compatible enough with the following sample to go into the same chunk. This changed the test results quite a bit. But IMHO it makes much more sense now. - Continuing past efforts, I changed more namings of `Samples` into `Floats` to keep things consistent and less confusing. (Histogram samples are also samples.) I still avoided changing names in other packages. - I added a few shortcuts `h := a.head`, saving many characters. TODOs: - Address @krajorama's TODOs about commit order and staleness handling. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	46cfc9fb99	tsdb: Extend TestDataNotAvailableAfterRollback This exposes the ommission of float histograms from the rollback. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
Bryan Boreham	11c49151b7	[REFACTOR] TSDB chunks: replace magic numbers with constants (#17095 ) For size of header and position of flags byte. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-09-02 16:05:21 +01:00
Bryan Boreham	aa12c0d4c3	Merge pull request #17074 from prymitive/logs TSDB: Log when GC / block write starts	2025-09-02 12:55:12 +01:00
Bryan Boreham	8e133e100f	Merge pull request #17081 from prometheus/superq/if_err_nil tsdb: Fixup err nil checks	2025-09-02 12:37:51 +01:00

1 2 3 4 5 ...

1453 Commits