prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-09-21 13:51:00 +02:00

Author	SHA1	Message	Date
György Krajcsovits	5b39b79f5a	refactor error creation and tests Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-19 09:26:34 +02:00
György Krajcsovits	b99378f2c4	Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation	2025-09-19 08:59:00 +02:00
György Krajcsovits	267be7dc20	fix(chunkenc): error out when reading unknown histogram schemas from chunks Otherwise higher level code like PromQL needs to constantly check if it can handle the samples. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-18 09:21:03 +02:00
beorn7	bd0bf66f31	tsdb: Include floatHistograms in headAppender.Rollback() Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	b1fbf4f1e2	tsdb: Refactor staleness marker handling With the fixed commit order, we can now handle the conversion of float staleness markers to histogram staleness markers in a more direct way. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	7e82bdb75b	tsdb: Fix commit order for mixed-typed series Fixes https://github.com/prometheus/prometheus/issues/15177 The basic idea here is to divide the samples to be commited into (sub) batches whenever we detect that the same series receives a sample of a type different from the previous one. We then commit those batches one after another, and we log them to the WAL one after another, so that we hit both birds with the same stone. The cost of the stone is that we have to track the sample type of each series in a map. Given the amount of things we already track in the appender, I hope that it won't make a dent. Note that this even addresses the NHCB special case in the WAL. This does a few other things that I could not resist to pick up on the go: - It adds more zeropool.Pools and uses the existing ones more consistently. My understanding is that this was merely an oversight. Maybe the additional pool usage will compensate for the increased memory demand of the map. - Create the synthetic zero sample for histograms a bit more carefully. So far, we created a sample that always went into its own chunk. Now we create a sample that is compatible enough with the following sample to go into the same chunk. This changed the test results quite a bit. But IMHO it makes much more sense now. - Continuing past efforts, I changed more namings of `Samples` into `Floats` to keep things consistent and less confusing. (Histogram samples are also samples.) I still avoided changing names in other packages. - I added a few shortcuts `h := a.head`, saving many characters. TODOs: - Address @krajorama's TODOs about commit order and staleness handling. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
beorn7	46cfc9fb99	tsdb: Extend TestDataNotAvailableAfterRollback This exposes the ommission of float histograms from the rollback. Signed-off-by: beorn7 <beorn@grafana.com>	2025-09-17 19:22:25 +02:00
Bryan Boreham	11c49151b7	[REFACTOR] TSDB chunks: replace magic numbers with constants (#17095 ) For size of header and position of flags byte. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-09-02 16:05:21 +01:00
Bryan Boreham	aa12c0d4c3	Merge pull request #17074 from prymitive/logs TSDB: Log when GC / block write starts	2025-09-02 12:55:12 +01:00
Bryan Boreham	8e133e100f	Merge pull request #17081 from prometheus/superq/if_err_nil tsdb: Fixup err nil checks	2025-09-02 12:37:51 +01:00
bwplotka	794bf774c2	Reapply "prw: use Unit and Type labels for metadata when feature flag is enabled (#17033 )" This reverts commit f5fab4757733746a708e7b80324b8929c1b84856.	2025-08-29 08:16:37 +01:00
bwplotka	f5fab47577	Revert "prw: use Unit and Type labels for metadata when feature flag is enabled (#17033 )" This reverts commit c808a71e18d4d1cc91e1d06859ebeae818465324.	2025-08-29 08:15:28 +01:00
Jonathan	c808a71e18	prw: use Unit and Type labels for metadata when feature flag is enabled (#17033 ) * chore: send Unit and Type when feature flag is enabled Signed-off-by: perebaj <perebaj@gmail.com> * remove unused code and comments Signed-off-by: perebaj <perebaj@gmail.com> * remove unreal scenario Signed-off-by: perebaj <perebaj@gmail.com> * remove unused if Signed-off-by: perebaj <perebaj@gmail.com> * remove unused labels Signed-off-by: perebaj <perebaj@gmail.com> * linter Signed-off-by: perebaj <perebaj@gmail.com> * enable type and unit through remotewrite config Signed-off-by: perebaj <perebaj@gmail.com> * remove test comment and capture type and unit when flag is enabled Signed-off-by: perebaj <perebaj@gmail.com> * gofumpt Signed-off-by: perebaj <perebaj@gmail.com> * modelTypeToWriteV2Type Signed-off-by: perebaj <perebaj@gmail.com> * use NewMetadataFromLabels Signed-off-by: perebaj <perebaj@gmail.com> * capture feature flag from main Signed-off-by: perebaj <perebaj@gmail.com> * simplifying logic Signed-off-by: perebaj <perebaj@gmail.com> * remove unused function Signed-off-by: perebaj <perebaj@gmail.com> * formatting code Signed-off-by: perebaj <perebaj@gmail.com> * gofumpt Signed-off-by: perebaj <perebaj@gmail.com> * remove public var: EnableTypeAndUnitLabels Signed-off-by: perebaj <perebaj@gmail.com> * remove enableTypeAndUnitLabels from TestPopulateV2TimeSeries_typeAndUnitLabels Signed-off-by: perebaj <perebaj@gmail.com> * remove enableTypeAndUnitLabels from main Signed-off-by: perebaj <perebaj@gmail.com> * use schema helper to populate metadata Signed-off-by: perebaj <perebaj@gmail.com> * remove metadata since nil is the default value Signed-off-by: perebaj <perebaj@gmail.com> * add TestPopulateV2TimeSeries_UnexpectedMetadata Signed-off-by: perebaj <perebaj@gmail.com> * Update storage/remote/queue_manager_test.go Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> --------- Signed-off-by: perebaj <perebaj@gmail.com> Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com> Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>	2025-08-29 04:10:01 +00:00
Björn Rabenstein	ba808d1736	Merge pull request #17092 from prometheus/beorn7/cleanup Apply analyzer modernize to the whole codebase	2025-08-28 00:42:33 +02:00
Bryan Boreham	2fb50b12cd	[PERF] TSDB: Optimize appender creation on empty chunks (#16922 ) Skip creating an iterator and walking all through any existing values, when we can easily tell there are no existing values. This is the normal case - the TSDB head creates an appender immediately after creating every chunk. Remove redundant handling of empty chunks. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-08-27 17:11:08 +01:00
beorn7	23f1d3ba25	tsdb: Remove unused `Layout()` methods Both `HistogramChunk` and `FloatHistogramChunk` have a `Layout()` method for historical reasons. As it has turned out, these methods are unused and also buggy. This commit simply removes them. Signed-off-by: beorn7 <beorn@grafana.com>	2025-08-27 17:01:23 +02:00
beorn7	71c21fb9e4	Fix minor issues after applying analyzer "modernize" - The tool left an empty line behind that we don't need anymore, see https://github.com/prometheus/prometheus/pull/17092. (Arguably not a bug in the tool but just our stricter style about empty lines.) - In tsdb/index/postings_test.go , our (admittedly somewhat convoluted) code structure tricked the tool so it spit out something that wouldn't even compile. - storage/remote/queue_manager_test.go is just a minor formatting nit. Signed-off-by: beorn7 <beorn@grafana.com>	2025-08-27 15:44:11 +02:00
beorn7	747c5ee2b1	Apply analyzer "modernize" to the whole codebase See https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize for details. This ran into a few issues (arguably bugs in the modernize tool), which I will fix in the next commit, so that we have transparency what was done automatically. Beyond those hiccups, I believe all the changes applied are legitimate. Even where there might be no tangible direct gain, I would argue it's still better to use the "modern" way to avoid micro discussions in tiny style PRs later. Signed-off-by: beorn7 <beorn@grafana.com>	2025-08-27 14:48:41 +02:00
Lukasz Mierzwa	31282d67b7	Log when GC / block write starts Right now Prometheus only logs when these operations are completed. It's a bit surprising to see suddenly a message saying "I was busy doing X for the past N minutes" so let's add a message when the operation starts, so it's easier to understand what Prometheus was doing at any point in time when reading logs. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>	2025-08-26 10:30:22 +01:00
SuperQ	b87cbf0294	Fixup err nil checks Cleanup double `if` statements for errors being nil / not-nil. Signed-off-by: SuperQ <superq@gmail.com>	2025-08-25 17:37:02 +02:00
Minh Nguyen	c8deefb038	[tsdb] Add CounterResetHint: CounterReset to synthetic zero sample (#17011 ) Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>	2025-08-21 23:26:01 +02:00
Bryan Boreham	498f63e60b	Merge pull request #17029 from pr00se/wal-checkpoint-dropped-samples TSDB: use timestamps rather than WAL segment numbers to track how long deleted series should be retained in checkpoints	2025-08-20 11:15:10 +01:00
Ganesh Vernekar	a86d9a3858	Merge pull request #16925 from prometheus/codesome/stale-series-tracking tsdb: Track stale series in the Head block based on stale sample	2025-08-19 15:35:19 -07:00
Patryk Prus	bbc9e47e42	Add comment about differences between agent mode and regular Prometheus Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-19 18:33:52 -04:00
Ganesh Vernekar	3904b3cd5f	Restore stale series count from chunk snapshots Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>	2025-08-19 15:07:37 -07:00
Ganesh Vernekar	b29ce3e489	Restore stale series count on WAL replay Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>	2025-08-19 15:07:37 -07:00
Ganesh Vernekar	0c3d3d7466	Test the stale series tracking in Head Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>	2025-08-19 15:07:37 -07:00
Ganesh Vernekar	7a947d3629	Track stale series in the Head Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>	2025-08-19 15:07:27 -07:00
Victor Herrero Otal	0cbbc9b7d3	docs: Fix chunk format documentation for `varint` encoding While preparing PR #16701, we identified an inconsistency in the chunk format documentation. The `varint` encoding can require up to 10 bytes for a 64-bit integer, such as when timestamps are encoded. However, the chunk length field is a 32-bit integer, which requires at most 5 bytes in `varint` encoding. This is reflected in the code, where a maximum of 5 bytes are read when parsing the chunk length. `50ba25f273/tsdb/chunks/chunks.go (L709-L711)` `50ba25f273/tsdb/chunks/chunks.go (L47-L48)` Co-authored-by: Istvan Zoltan Ballok <istvan.zoltan.ballok@sap.com> Signed-off-by: Victor Herrero Otal <victor.herrero.otal@sap.com>	2025-08-15 10:56:21 +02:00
pipiland2612	82a4b12507	Add t.parallel() for ./tsdb Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>	2025-08-12 14:12:42 +02:00
pipiland2612	de93387f0b	Parallel tsdb/wlog test Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>	2025-08-12 14:12:15 +02:00
Patryk Prus	676f7665fa	Use testutil.RequireEqual to handle dedupelabels in test Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:52:03 -04:00
Patryk Prus	ead6dc32b9	Fix test Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:34:56 -04:00
Patryk Prus	5cb0192626	Address linter errors Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:25:14 -04:00
Patryk Prus	0fea41ed53	Refactor keep function to work for both agent and non-agent implementations Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:12:47 -04:00
Patryk Prus	6875022873	Update head.walExpiries with record timestamps during WAL replay Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:12:47 -04:00
Patryk Prus	218558f543	Store mint rather than the last WAL segment in head.walExpiries during head GC Signed-off-by: Patryk Prus <p@trykpr.us>	2025-08-08 14:12:41 -04:00
Bryan Boreham	ba2a8aba8c	Merge pull request #17000 from bboreham/clarify-intersect [REFACTOR] TSDB: Clarify intersectPostings	2025-08-06 12:07:10 +01:00
Bartlomiej Plotka	5df982538f	Merge pull request #16994 from mmorel-35/unused-parameters chore: enable unused-receiver rule from revive	2025-08-06 10:46:28 +01:00
Bartlomiej Plotka	291e2ae090	Merge pull request #17006 from sujalshah-bit/fix_wal_dir_bug wal: ignore os.ErrNotExist errors in DirSize during WAL size calculation	2025-08-06 10:40:02 +01:00
Ganesh Vernekar	64808d4f56	Merge pull request #16968 from pipiland2612/Remove_label_index tsdb: Remove writing Label Index and Label Offset Table in the index	2025-08-05 15:12:44 -07:00
Sujal Shah	17c58f5fce	wal: ignore os.ErrNotExist errors in DirSize during WAL size calculation This change updates `DirSize` to ignore `os.ErrNotExist` errors, since they are expected during normal WAL cleanup. All other errors continue to propagate. Fixes: #17005 Signed-off-by: Sujal Shah <sujalshah28092004@gmail.com>	2025-08-05 22:41:46 +05:30
Bryan Boreham	e068c7332d	[REFACTOR] TSDB: Clarify intersectPostings This is intended to make `intersectPostings` easier to follow. Instead of cryptic `arr` and `cur`, name the members `postings` and `current`. Instead of updating `cur` to intermediate values encountered during operations, introduce a local variable `target` meaning the ref we might expect to find next, and only update `current` when an intersection is found. Name the function which implements seeking `Seek` instead of `doNext`. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-08-05 13:09:29 +01:00
Alan Protasio	25aee26a57	Improving "Sparse postings" intersection (#13971 ) Lets take the given example: P1: [2, 5, 9, 18, 21] P2: [3, 7, 14, 19, 21] P3: [1, 21] Currently, we would only advance through P1 and P2 until discovering an intersection and then checking P3. In essence, the traversal order was: 2, 3, 5, 7, 9, 14, 18, 19, 21 (intersection found). With the proposed change, P3 is also examined even if P1 and P2 haven't found an intersection yet. This adjustment allows for the possibility of skipping some iterations. Post-change, the traversal order becomes: 2, 3, 21 (3 iterations instead of 9). Signed-off-by: alanprot <alanprot@gmail.com>	2025-08-05 12:22:54 +01:00
Matthieu MOREL	cef219c31c	chore: enable unused-receiver rule from revive Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-08-04 09:43:33 +00:00
pipiland2612	8b24acb729	Remove label index and labe offset index Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>	2025-08-01 13:50:49 +03:00
Julius Volz	2e709c6567	Merge pull request #16695 from sujalshah-bit/block_endpoint api: Create `/status/tsdb/blocks` endpoint.	2025-07-31 18:15:49 +02:00
George Krajcsovits	3f59fe1a80	fix(chunkenc): appending histograms with empty buckets (#16893 ) * test(chunkenc): appending histograms with empty buckets and gaps Append such native histograms that have empty buckets and gaps between the indexes of those buckets. There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing. For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them. The code that I've added in #14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them. Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts. I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com> Co-authored-by: Björn Rabenstein <beorn@grafana.com>	2025-07-24 18:01:02 +02:00
machine424	9a0bbb60bc	test(tsdb): disable TestDelayedCompaction/delayed_compaction_enabled on windows as flaky because of Time imprecision fixes https://github.com/prometheus/prometheus/issues/16450 Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2025-07-22 15:30:05 +01:00
Charles Korn	46acc974c0	fix(remote): Unregister metrics emitted by `remote.WriteStorage` when closed (#16868 ) * Unregister metrics emitted by `remote.WriteStorage` when closed Signed-off-by: Charles Korn <charles.korn@grafana.com> * Address PR feedback: add test Signed-off-by: Charles Korn <charles.korn@grafana.com> --------- Signed-off-by: Charles Korn <charles.korn@grafana.com>	2025-07-17 11:32:15 +02:00

1 2 3 4 5 ...

1413 Commits