Commit Graph

1330 Commits

Author SHA1 Message Date
Oleg Zaytsev
e4fe8d8684
Create memSeries with pendingCommit=true
This fixes TestHead_RaceBetweenSeriesCreationAndGC.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2025-03-27 11:11:57 +01:00
Oleg Zaytsev
df33f1aace Add TestHead_RaceBetweenSeriesCreationAndGC
This test consistently fails missing ~10 series.
If it doesn't fail on your machine, just increase totalSeries, that's
how race conditions work.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2025-03-27 10:56:24 +01:00
Matthieu MOREL
5fa1146e21
chore: enable gci linter (#16245)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-03-22 15:46:13 +00:00
Patryk Prus
2f43a5a3ab
TSDB: don't process exemplars older than minValidTime during WAL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-19 16:24:08 -04:00
Ganesh Vernekar
bc595263c1
Merge pull request #16231 from pr00se/multiref-improvements
TSDB: Handle metadata/tombstones/exemplars for duplicate series during WAL replay
2025-03-19 16:15:50 -04:00
pudongair
308c8c48c1
chore: fix some comments (#16237)
Signed-off-by: pudongair <744355276@qq.com>
2025-03-19 16:28:34 +01:00
Ziqi Zhao
f6903bcc22
Let HistogramAppender.appendable return CounterResetHeader instead of… (#16195)
Let HistogramAppender.appendable return CounterResetHeader instead of boolean

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2025-03-18 17:40:27 +01:00
Patryk Prus
e4e1b515bc
TSDB: Handle metadata/tombstones/exemplars for duplicate series during WAL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-18 12:22:33 -04:00
Fiona Liao
37c2ebb5fd
Make out-of-order native histograms flag a no-op and always enable (#16207)
* Remove experimental out-of-order native histogram flag

This feature has been available in Prometheus since September 2024,
and has no known issues. Therefore proposing to remove the flag
entirely and always have it on. Note that there are still two
settings that need to be configured (out-of-order time window > 0
and native histograms enabled) for this feature to work.

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Update CHANGELOG

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Keep feature flag with warning

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Update CHANGELOG

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Update tsdb/head_append.go

Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Update CHANGELOG.md

Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Update tsdb/head_append.go

Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Additional cleanup of comments and test names

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

---------

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-03-18 10:59:02 +00:00
Patryk Prus
86eeaf1886
Skip writing series records uniformly across the benchmark, so we skip some OOO series as well
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:53 -04:00
Patryk Prus
2147538d1e
Add missing series refs to benchmark
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:53 -04:00
Patryk Prus
401dbacf2e
Add counters for unknown series references during WAL/WBL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:53 -04:00
Patryk Prus
85fa39032e
TSDB: Track count of unknown series referenced during WAL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:48 -04:00
Bryan Boreham
30d04792ca
[PERF] Remote-write: re-use memory to read WAL data (#16197)
The `:=` causes new variables to be created, which means the outer
slice stays at nil, and new memory is allocated every time round the
loop.

Extracted from https://github.com/prometheus/prometheus/pull/16182
Credit to @bwplotka.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-03-11 10:49:51 +00:00
Bartlomiej Plotka
7a7bc65237
Add util/compression package to consolidate snappy/zstd use in Prometheus. (#16156)
# Conflicts:
#	tsdb/db_test.go

Apply suggestions from code review




tmp



Addressed comments.



Update util/compression/buffers.go

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
2025-03-10 10:36:26 +00:00
Arve Knudsen
56929ffa42 Upgrade to Go v1.24 (#16180)
* Upgrade to Go v1.24
* Upgrade golangci-lint

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-03-07 11:28:26 +01:00
Patryk Prus
61aa82865d
TSDB: keep duplicate series records in checkpoints while their samples may still be present (#16060)
Renames the head's deleted map to walExpiries, and creates entries for any
duplicate series records encountered during WAL replay, with the expiry set
to the highest current WAL segment number. Any subsequent WAL
checkpoints will see the duplicate series entry in the walExpiries map, and
keep the series record until the last WAL segment that could contain its
samples is deleted.

Other considerations:

WBL: series records aren't written to the WBL, so there are no duplicates to deal with
agent mode: has its own WAL replay logic that handles duplicate series records differently, and is outside the scope of this PR
2025-03-05 13:45:08 -05:00
Arve Knudsen
7cbf749096
Upgrade to github.com/oklog/ulid/v2 (#16168)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-03-05 16:03:25 +01:00
Bryan Boreham
42d55505f9
Merge pull request #12659 from prymitive/memChunk
Short-cut common memChunk operations
2025-02-25 11:33:56 +00:00
Matthieu MOREL
c7d4b53ec1 chore: enable unused-parameter from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-19 19:50:28 +01:00
Ayoub Mrini
e04913aea2
Merge pull request #15778 from machine424/reuse-pools
feat(tsdb/(head|agent)): reuse pools across segments to reduce garbage during WL replay
2025-02-17 12:48:17 +01:00
Bartlomiej Plotka
de23a9667c
prw2: Split PRW2.0 from metadata-wal-records feature (#16030)
Rationales:

* metadata-wal-records might be deprecated and replaced going forward: https://github.com/prometheus/prometheus/issues/15911
* PRW 2.0 works without metadata just fine (although it sends untyped metrics as expected).

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-02-13 12:16:33 +00:00
machine424
d644324407
feat(tsdb/(head|agent)): reuse pools across segments to avoid generating garbage during WL replay
This is part of the "reduce WAL replay overhead/garbage" effort to help with https://github.com/prometheus/prometheus/issues/6934.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-02-10 22:40:24 +01:00
Matthieu MOREL
b472ce7010 chore: enable early-return from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-10 22:08:43 +01:00
Bryan Boreham
b74cebf6bf
Merge pull request #12920 from prymitive/compactLock
Fix locks in db.reloadBlocks()
2025-02-10 17:35:09 +00:00
Dimitar Dimitrov
686dcc7b0d
headIndexReader: reduce debug logging (#15993)
Around Mimir compactions we see logging in ShardedPostings do massive allocations and drive GC up to 50% of CPU.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2025-02-07 15:46:55 +00:00
SuryaPrakash
cb3b17a14c
fix: os.MkdirTemp with t.TempDir (#15860)
Signed-off-by: Surya Prakash <surya0prakash@proton.me>
2025-01-31 14:32:20 +00:00
Alan Protasio
9d1abbb9ed
Call PostCreation callback only after the new series is added to the mempotings (#15579)
Signed-off-by: alanprot <alanprot@gmail.com>
2025-01-28 12:11:58 +01:00
Jan Fajerski
6823f58e59
Merge pull request #15732 from bboreham/benchmark-setup-append-periodically
TSDB benchmarks: Commit periodically to speed up init
2025-01-28 11:35:04 +01:00
Bryan Boreham
6ba25ba93f tsdb tests: avoid 'defer' till end of function
'defer' only runs at the end of the function, so explicitly close the
querier after we finish with it. Also check it didn't error.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-01-27 19:59:43 +00:00
Bryan Boreham
2f615a200d tsdb tests: restrict some 'defer' operations
'defer' only runs at the end of the function, so introduce some more
functions / move the start, so that 'defer' can run at the end of the
logical block.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-01-27 19:59:43 +00:00
Bryan Boreham
f4fbe47254 tsdb tests: avoid capture-by-reference in goroutines
Only one version of the variable is captured; this is a source of race conditions.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-01-27 19:59:43 +00:00
piguagua
a82f2b8168
chore: fix function name and struct name in comment (#15827)
Signed-off-by: piguagua <piguagua@aliyun.com>
2025-01-17 21:26:08 +01:00
Julius Volz
0d7db907a9
Merge pull request #15785 from crystalstall/main
refactor: using slices.Contains to simplify the code
2025-01-13 10:31:41 +01:00
crystalstall
616914abe2 Signed-off-by: crystalstall <crystalruby@qq.com>
refactor: using slices.Contains to simplify the code

Signed-off-by: crystalstall <crystalruby@qq.com>
2025-01-11 00:41:51 +08:00
Lukasz Mierzwa
e3728122b2 Update comments for methods that require a lock
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:20:10 +00:00
Lukasz Mierzwa
a1740cd2e7 Remove unnecessary locks
Compact() is an uppercase function that deals with locks on its own, so we shouldn't have a lock around it.

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
2025-01-09 17:06:05 +00:00
Łukasz Mierzwa
d106b3beb7 Wrap db.blocks read in a read lock
We don't hold db.mtx lock when trying to read db.blocks here so we need a read lock around this loop.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:06:05 +00:00
Łukasz Mierzwa
92788d313a Remove TestTombstoneCleanRetentionLimitsRace
This test ensures that running db.reloadBlocks() and db.CleanTombstones() at the same time doesn't race.
The problem is that CleanTombstones() is a public method while reloadBlocks() is internal.
CleanTombstones() sets db.cmtx lock while reloadBlocks() is not protected by any locks at all, it expects the public method through which it was called to do it.
So having a race between these two is not unexpected and we shouldn't really be testing this.
db.cmtx ensures that no other function can be modifying the list of open blocks and so the scenario tested here cannot happen.
If it would happen it would be only because some other method doesn't aquire db.ctmx lock, something this test cannot detect.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:06:03 +00:00
Łukasz Mierzwa
b880cea613 Fix locks in db.reloadBlocks()
This partially reverts ae3d392aa9.

ae3d392aa9 added a call to db.mtx.Lock() that lasts for the entire duration of db.reloadBlocks(),
previous db.mtx would be locked only during critical part of db.reloadBlocks().
The motivation was to protect against races:
9e0351e161 (r555699794)
The 'reloads' being mentioned are (I think) reloadBlocks() calls, rather than db.reload() or other methods.
TestTombstoneCleanRetentionLimitsRace was added to catch this but I wasn't able to ever get any error out of it, even after disabling all calls to db.mtx in reloadBlocks() and CleanTombstones().
To make things more complicated CleanupTombstones() itself calls reloadBlocks(), so it seems that the real issue is that we might have concurrent calls to reloadBlocks().

The problem with this change is that db.reloadBlocks() can take a very long time, that's because it might need to load very large blocks from disk, which is slow.
While db.mtx is locked a large chunk of the db is locked, including queries, since db.mtx read lock is needed for db.Querier() call.
One of the issues this manifests itself as is a gap in all metrics and blocked queries just after a large block compaction happens.
When compaction merges multiple day-or-more blocks into a week-or-more block it create a single very big block.
After that block is written it needs to be loaded and that seems to be taking many seconds (30-45), during which mtx is held and everything is blocked.

Turns out that there is another lock that is more fine grained and aimed at this specific use case:

// cmtx ensures that compactions and deletions don't run simultaneously.
cmtx sync.Mutex

All calls to reloadBlocks() are wrapped inside cmtx lock. The only exception is db.reload() which this change fixes.
We can't add cmtx lock inside reloadBlocks() itself because it's called by a number of functions, some of which are already holding cmtx.

Looking at the code I think it is sufficient to hold cmtx and skip a reloadBlocks() wide mtx call.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:05:39 +00:00
Arve Knudsen
f030894c2c
Fix issues raised by staticcheck (#15722)
Fix issues raised by staticcheck

We are not enabling staticcheck explicitly, though, because it has too many false positives.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-01-09 17:51:26 +01:00
Ben Ye
919a5b657e
Expose ListPostings Length via Len() method (#15678)
tsdb: expose remaining ListPostings Length

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-07 17:58:26 +01:00
György Krajcsovits
1e420ef373 Merge branch 'main' into cedwards/nhcb-wal-wbl
# Conflicts:
#	tsdb/tsdbutil/histogram.go
2025-01-02 12:50:19 +01:00
György Krajcsovits
a7ccc8e091 record_test.go: avoid captures, simply return test refs
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-01-02 12:45:20 +01:00
Bryan Boreham
096e2aa7bd
Merge pull request #14518 from bboreham/faster-listpostings-merge
TSDB: Optimization: Merge postings using concrete type
2025-01-02 10:43:45 +00:00
Bryan Boreham
b2fa1c9524 TSDB benchmarks: Commit periodically to speed up init
When creating dummy data for benchmarks, call `Commit()` periodically to
avoid growing the appender to enormous size.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-30 17:42:56 +00:00
johncming
061400e31b
tsdb: export CheckpointPrefix constant (#15636)
Exported the CheckpointPrefix constant to be used in other packages.
Updated references to the constant in db.go and checkpoint.go files.
This change improves code readability and maintainability.

Signed-off-by: johncming <johncming@yahoo.com>
Co-authored-by: johncming <conjohn668@gmail.com>
2024-12-29 17:54:45 +01:00
Carrie Edwards
1508149184 Update benchmark test and comment 2024-12-27 09:09:13 -08:00
Bryan Boreham
cfa32f3d28 TSDB: Move merge of head postings into index
This enables it to take advantage of a more compact data structure
since all postings are known to be `*ListPostings`.

Remove the `Get` member which was not used for anything else, and fix up
tests.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-20 19:22:30 +00:00
Bryan Boreham
0a8779f46d TSDB: Make mergedPostings generic
Now we can call it with more specific types which is more efficient than
making everything go through the `Postings` interface.

Benchmark the concrete type.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-20 17:09:21 +00:00