prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-09-21 13:51:00 +02:00

Author	SHA1	Message	Date
beorn7	7e82bdb75b	tsdb: Fix commit order for mixed-typed series Fixes https://github.com/prometheus/prometheus/issues/15177 The basic idea here is to divide the samples to be commited into (sub) batches whenever we detect that the same series receives a sample of a type different from the previous one. We then commit those batches one after another, and we log them to the WAL one after another, so that we hit both birds with the same stone. The cost of the stone is that we have to track the sample type of each series in a map. Given the amount of things we already track in the appender, I hope that it won't make a dent. Note that this even addresses the NHCB special case in the WAL. This does a few other things that I could not resist to pick up on the go: - It adds more zeropool.Pools and uses the existing ones more consistently. My understanding is that this was merely an oversight. Maybe the additional pool usage will compensate for the increased memory demand of the map. - Create the synthetic zero sample for histograms a bit more carefully. So far, we created a sample that always went into its own chunk. Now we create a sample that is compatible enough with the following sample to go into the same chunk. This changed the test results quite a bit. But IMHO it makes much more sense now. - Continuing past efforts, I changed more namings of `Samples` into `Floats` to keep things consistent and less confusing. (Histogram samples are also samples.) I still avoided changing names in other packages. - I added a few shortcuts `h := a.head`, saving many characters. TODOs: - Address @krajorama's TODOs about commit order and staleness handling. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	46cfc9fb99	tsdb: Extend TestDataNotAvailableAfterRollback This exposes the ommission of float histograms from the rollback. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	747c5ee2b1	Apply analyzer "modernize" to the whole codebase See https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize for details. This ran into a few issues (arguably bugs in the modernize tool), which I will fix in the next commit, so that we have transparency what was done automatically. Beyond those hiccups, I believe all the changes applied are legitimate. Even where there might be no tangible direct gain, I would argue it's still better to use the "modern" way to avoid micro discussions in tiny style PRs later. Signed-off-by: beorn7 <beorn@grafana.com>	2025-08-27 14:48:41 +02:00
Bryan Boreham	498f63e60b	Merge pull request #17029 from pr00se/wal-checkpoint-dropped-samples TSDB: use timestamps rather than WAL segment numbers to track how long deleted series should be retained in checkpoints	2025-08-20 11:15:10 +01:00
pipiland2612	82a4b12507	Add t.parallel() for ./tsdb Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>	2025-08-12 14:12:42 +02:00
Patryk Prus	0fea41ed53	Refactor keep function to work for both agent and non-agent implementations Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:12:47 -04:00
Patryk Prus	218558f543	Store mint rather than the last WAL segment in head.walExpiries during head GC Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:12:41 -04:00
Matthieu MOREL	cef219c31c	chore: enable unused-receiver rule from revive Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-08-04 09:43:33 +00:00
socialsister	869c946370	chore: fix some minor issues in comments Signed-off-by: socialsister <seekseat@qq.com>	2025-07-16 11:24:42 +01:00
liangmulu	b1a7df2c0c	chore: fix some minor issues in comments Signed-off-by: liangmulu <liangmulu@outlook.com>	2025-07-09 18:05:41 +08:00
Ayoub Mrini	2edc3ed6c5	feat(tsdb): introduce --use-uncached-io feature flag and allow using it for chunks writing (#15365 ) Signed-off-by: machine424 <ayoubmrini424@gmail.com> Signed-off-by: Ayoub Mrini <ayoubmrini424@gmail.com>	2025-05-21 14:42:30 +02:00
Arve Knudsen	e7e3ab2824	Fix linting issues found by golangci-lint v2.0.2 (#16368 ) * Fix linting issues found by golangci-lint v2.0.2 --------- Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2025-05-03 19:05:13 +02:00
Bryan Boreham	a11772234d	Merge pull request #16333 from colega/fix-series-create-gc-race fix: race condition between series creation and garbage collection	2025-04-17 12:15:11 +01:00
Ryan Wu	7d73c1d3f8	refactor[discovery, tsdb]: simplify error handling and remove redundant checks (#16328 ) * refactor: simplify error handling and remove redundant checks Signed-off-by: Ryan Wu <rongjun0821@gmail.com> * Add the comment for return of reloading blocks failure Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Signed-off-by: Ryan Wu <rongjun0821@gmail.com> * Add the comment for return of reloading blocks failure Signed-off-by: Ryan Wu <rongjun0821@gmail.com> --------- Signed-off-by: Ryan Wu <rongjun0821@gmail.com> Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>	2025-03-27 12:20:59 +01:00
Oleg Zaytsev	e4fe8d8684	Create memSeries with pendingCommit=true This fixes TestHead_RaceBetweenSeriesCreationAndGC. Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	2025-03-27 11:11:57 +01:00
Fiona Liao	37c2ebb5fd	Make out-of-order native histograms flag a no-op and always enable (#16207 ) * Remove experimental out-of-order native histogram flag This feature has been available in Prometheus since September 2024, and has no known issues. Therefore proposing to remove the flag entirely and always have it on. Note that there are still two settings that need to be configured (out-of-order time window > 0 and native histograms enabled) for this feature to work. Signed-off-by: Fiona Liao <fiona.liao@grafana.com> * Update CHANGELOG Signed-off-by: Fiona Liao <fiona.liao@grafana.com> * Keep feature flag with warning Signed-off-by: Fiona Liao <fiona.liao@grafana.com> * Update CHANGELOG Signed-off-by: Fiona Liao <fiona.liao@grafana.com> * Update tsdb/head_append.go Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com> * Update CHANGELOG.md Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com> * Update tsdb/head_append.go Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com> * Additional cleanup of comments and test names Signed-off-by: Fiona Liao <fiona.liao@grafana.com> --------- Signed-off-by: Fiona Liao <fiona.liao@grafana.com> Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>	2025-03-18 10:59:02 +00:00
Bartlomiej Plotka	7a7bc65237	Add util/compression package to consolidate snappy/zstd use in Prometheus. (#16156 ) # Conflicts: # tsdb/db_test.go Apply suggestions from code review tmp Addressed comments. Update util/compression/buffers.go Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>	2025-03-10 10:36:26 +00:00
Patryk Prus	61aa82865d	TSDB: keep duplicate series records in checkpoints while their samples may still be present (#16060 ) Renames the head's deleted map to walExpiries, and creates entries for any duplicate series records encountered during WAL replay, with the expiry set to the highest current WAL segment number. Any subsequent WAL checkpoints will see the duplicate series entry in the walExpiries map, and keep the series record until the last WAL segment that could contain its samples is deleted. Other considerations: WBL: series records aren't written to the WBL, so there are no duplicates to deal with agent mode: has its own WAL replay logic that handles duplicate series records differently, and is outside the scope of this PR	2025-03-05 13:45:08 -05:00
Arve Knudsen	7cbf749096	Upgrade to github.com/oklog/ulid/v2 (#16168 ) Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2025-03-05 16:03:25 +01:00
Matthieu MOREL	c7d4b53ec1	chore: enable unused-parameter from revive Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-02-19 19:50:28 +01:00
Bryan Boreham	b74cebf6bf	Merge pull request #12920 from prymitive/compactLock Fix locks in db.reloadBlocks()	2025-02-10 17:35:09 +00:00
Bryan Boreham	2f615a200d	tsdb tests: restrict some 'defer' operations 'defer' only runs at the end of the function, so introduce some more functions / move the start, so that 'defer' can run at the end of the logical block. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-01-27 19:59:43 +00:00
Łukasz Mierzwa	92788d313a	Remove TestTombstoneCleanRetentionLimitsRace This test ensures that running db.reloadBlocks() and db.CleanTombstones() at the same time doesn't race. The problem is that CleanTombstones() is a public method while reloadBlocks() is internal. CleanTombstones() sets db.cmtx lock while reloadBlocks() is not protected by any locks at all, it expects the public method through which it was called to do it. So having a race between these two is not unexpected and we shouldn't really be testing this. db.cmtx ensures that no other function can be modifying the list of open blocks and so the scenario tested here cannot happen. If it would happen it would be only because some other method doesn't aquire db.ctmx lock, something this test cannot detect. Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>	2025-01-09 17:06:03 +00:00
György Krajcsovits	1e420ef373	Merge branch 'main' into cedwards/nhcb-wal-wbl # Conflicts: # tsdb/tsdbutil/histogram.go	2025-01-02 12:50:19 +01:00
Joel Beckmeyer	39f5a07236	fix TestOOOHeadChunkReader_Chunk on 32-bit Signed-off-by: Joel Beckmeyer <joel@beckmeyer.us>	2024-12-16 10:45:07 -05:00
Carrie Edwards	1933ccc9be	Fix test	2024-12-06 14:55:19 -08:00
Carrie Edwards	a046417bc0	Use new record type only for NHCB	2024-12-06 13:46:20 -08:00
Carrie Edwards	6684344026	Rename old histogram record type, use old names for new records	2024-12-05 09:21:47 -08:00
Fiona Liao	c599d37668	Always return unknown hint for first sample in non-gauge histogram chunk (#15343 ) Always return unknown hint for first sample in non-gauge histogram chunk --------- Signed-off-by: Fiona Liao <fiona.liao@grafana.com> Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2024-11-12 15:14:06 +01:00
Ben Ye	140f4aa9ae	feat: Allow customizing TSDB postings decoder (#13567 ) * allow customizing TSDB postings decoder --------- Signed-off-by: Ben Ye <benye@amazon.com>	2024-11-11 07:59:24 +01:00
Matthieu MOREL	af1a19fc78	enable errorf rule from perfsprint linter Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2024-11-06 16:50:36 +01:00
Alban Hurtaud	4b56af7eb8	Add hidden flag for the delayed compaction random time window (#14919 ) * Add hidden flag for the delayed compaction random time window Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> * Update cmd/prometheus/main.go Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com> * Update cmd/prometheus/main.go Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com> * Update tsdb/db.go Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com> Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com> * Fix flag name according to review - add test for delay Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> * Fix afer main rebase Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> * Implement review comments Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> * Update generatedelaytest to try with limit values Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> --------- Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com> Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com> Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>	2024-11-04 08:26:26 +01:00
György Krajcsovits	e6a682f046	Reproduce populateWithDelChunkSeriesIterator corrupting chunk meta When handling recoded histogram chunks the min time of the chunk is updated by mistake. It should only update when the chunk is completely new. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2024-10-18 10:34:22 +02:00
machine424	ab2475c426	test(tsdb): add a reproducer for https://github.com/prometheus/prometheus/issues/14422 Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-10-15 20:39:25 +02:00
TJ Hoplock	6ebfbd2d54	chore!: adopt log/slog, remove go-kit/log For: #14355 This commit updates Prometheus to adopt stdlib's log/slog package in favor of go-kit/log. As part of converting to use slog, several other related changes are required to get prometheus working, including: - removed unused logging util func `RateLimit()` - forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger - move some of the json file logging functionality to use prom/common package functionality - refactored some of the new json file logging for scraping - changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers - updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition - added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>	2024-10-07 15:58:50 -04:00
Carrie Edwards	14e3c05ce8	tsdb: Add support for ingestion of out-of-order native histogram samples (#14546 ) Add support for ingesting OOO native histograms * Add flag for enabling and disabling OOO native histogram ingestion * Update OOO querying tests to include native histogram samples * Add OOO head tests * Add test for OOO native histogram counter reset headers Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com> Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored by: Carrie Edwards <edwrdscarrie@gmail.com> Co-authored by: Jeanette Tan <jeanette.tan@grafana.com> Co-authored by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored by: Fiona Liao <fiona.liao@grafana.com>	2024-09-17 11:19:06 +02:00
Nathan Baulch	50cd453c8f	chore: Fix typos (#14868 ) * Fix typos --------- Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>	2024-09-10 22:32:03 +02:00
György Krajcsovits	41c076196e	New cases in Test_ChunkQuerier_OOOQuery and Test_Querier_OOOQuery Case 1: OOO in-memory head chunk overlaps with first mmaped in-order chunk. Query: \|----------------------------------------------------------------\| InO: \|------mmap---------------\|\|---------mem----------------------\| OOO: \|-----mem-----------\| This triggers ChunkOrIterableWithCopy not including OOO head chunks bug. Similar to #14693 however testing the end of the interval doesn't trigger the problem because there the in-order head chunk will be trimmed with a tombstone, causing the code to switch to ChunkOrIterable which was fixed. See `a36d1a8a92/tsdb/querier.go (L646)` where len(p.bufIter.Intervals) will be non zero, because it includes the tombstone to trim the result to the query max time. Thus a new test is added to check the overlap at the beginning of the interval that has a separate chunk, which does not need trimming. Note: same test doesn't fail for sample querier in Test_Querier_OOOQuery as that doesn't use copy, that is copyHeadChunk is false in the if condition above. Case 2: OOO mmaped head chunk overlaps with first mmaped in-order chunk. Query: \|----------------------------------------------------------------\| InO: \|------mmap---------------\|\|---------mem----------------------\| OOO: \|-----mmap-----------\| \|--mem--\| In this case the meta contains the reference of the in-order chunk and no indication that a merge is needed with the OOO mmaped chunk. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2024-08-23 15:50:47 +02:00
Bryan Boreham	9a74d53935	[BUGFIX] TSDB: Fix query overlapping in-order and ooo head (#14693 ) * tsdb: Unit test query overlapping in order and ooo head Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * TSDB: Merge overlapping head chunk The basic idea is that getOOOSeriesChunks can populate Meta.Chunk, but since it only returns one Meta per overlapping time-slot, that pointer may end up in a Meta with a head-chunk ID. So we need HeadAndOOOChunkReader.ChunkOrIterable() to call mergedChunks in that case. Previously, mergedChunks was checking that meta.Ref was a valid OOO chunk reference, but it never actually uses that reference; it just finds all chunks overlapping in time. So we can delete that code. Signed-off-by: Bryan Boreham <bjboreham@gmail.com> Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2024-08-21 14:24:20 +01:00
Arve Knudsen	3a78e76282	Upgrade golangci-lint to v1.60.1 Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2024-08-18 12:13:25 +02:00
machine424	82f38d3e9a	fix(tsdb/db_test.go): close the corrupted chunk after creating it to satisfy Windows FS Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-08-09 14:53:57 +02:00
machine424	92873d3009	feat: allow to delay head compaction start time helping Prometheus instances to avoid simultaneous compactions and reduce stress on shared resources. This is enabled via `--enable-feature=delayed-compaction`. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2024-08-07 17:10:27 +02:00
Bryan Boreham	80adc5baf4	Merge remote-tracking branch 'origin/main' into merge-2.54-to-main	2024-08-06 09:19:55 +01:00
Bryan Boreham	bded853035	[Test] TSDB: TestOOOCompaction with samples added after compaction starts Test fails due to bug. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2024-08-05 10:35:34 +01:00
Max Amin	84b819a69f	feat: add Google cloud roundtripper for remote write (#14346 ) * feat: Google Auth for remote write Signed-off-by: Max Amin <maxamin@google.com> --------- Signed-off-by: Max Amin <maxamin@google.com>	2024-07-30 16:25:19 +01:00
Bryan Boreham	d116bf7b9f	Merge pull request #14109 from harry671003/pass_limit_to_querier storage: pass limit param as hint in querier	2024-07-12 10:27:52 +01:00
Carrie Edwards	55f53330b2	Use storage.ExpandSamples instead of samplesFromIterator Co-authored by: Fiona Liao <fiona.liao@grafana.com>: Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>	2024-07-03 09:28:38 -07:00
Carrie Edwards	06550883c1	Clean up of tests and test utils Co-authored by: Fiona Liao <fiona.liao@grafana.com>: Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>	2024-07-03 09:28:38 -07:00
Carrie Edwards	45a32a29ef	Update tsdb tests to use test utils. Co-authored-by: Fiona Liao <fiona.liao@grafana.com> Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>	2024-07-03 09:28:38 -07:00
Ben Ye	5585a3c7e5	tsdb: expose hook to customize block querier (#14114 ) * expose hook for block querier Signed-off-by: Ben Ye <benye@amazon.com> * update comment Signed-off-by: Ben Ye <benye@amazon.com> * use defined type Signed-off-by: Ben Ye <benye@amazon.com> --------- Signed-off-by: Ben Ye <benye@amazon.com>	2024-06-25 09:47:06 +02:00

1 2 3 4 5

233 Commits