22 Commits

Author SHA1 Message Date
Jesus Vazquez
c1b669bf9b
Add out-of-order sample support to the TSDB (#11075)
* Introduce out-of-order TSDB support

This implementation is based on this design doc:
https://docs.google.com/document/d/1Kppm7qL9C-BJB1j6yb6-9ObG3AbdZnFUBYPNNWwDBYM/edit?usp=sharing

This commit adds support to accept out-of-order ("OOO") sample into the TSDB
up to a configurable time allowance. If OOO is enabled, overlapping querying
are automatically enabled.

Most of the additions have been borrowed from
https://github.com/grafana/mimir-prometheus/
Here is the list ist of the original commits cherry picked
from mimir-prometheus into this branch:
- 4b2198d7ec47d50989b7c2df66b7b207c32f7f6e
- 2836e5513f1bc591535a859f5d41154a75e7c6bc
- 00b379c3a5b1ec3799699b6242f300a2b3ea30f0
- ff0dc757587cada63ca948d2d5eb00bf090d63e0
- a632c73352a7e39d60b445700beb47d691549c3e
- c6f3d4ab339ab80bbbce74c9946237ced01f0509
- 5e8406a1d4a50d0052bbee83e28ca3b3371408aa
- abde1e0ba128936b9eb0224ee1551e56216ebd4a
- e70e7698897bb03860bee0467c733fa44e14c9bd
- df59320886e03a555d379ac4b0b3130f661407e0

Co-authored-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* gofumpt files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add license header to missing files

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO tests due to existing chunk disk mapper implementation

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix truncate int overflow

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Add Sync method to the WAL and update tests

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* remove useless sync

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Update minOOOTime after truncating Head

* Update minOOOTime after truncating Head

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Add a unit test

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Load OutOfOrderTimeWindow only once per appender

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix OOO Head LabelValues and PostingsForMatchers

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix replay of OOO mmap chunks

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Remove unnecessary err check

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Prevent panic with ApplyConfig

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run OOO compaction after restart if there is OOO data from WBL

Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Apply Bartek's suggestions

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Refactor OOO compaction

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address comments and TODOs

- Added a comment explaining why we need the allow overlapping
  compaction toggle
- Clarified TSDBConfig OutOfOrderTimeWindow doc
- Added an owner to all the TODOs in the code

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Run go format

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix remaining review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix tests

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Change wbl reference when truncating ooo in TestHeadMinOOOTimeUpdate

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>

* Fix TestWBLAndMmapReplay test failure on windows

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Address most of the feedback

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Refactor the block meta for out of order

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix windows error

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix review comments

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

Signed-off-by: Jesus Vazquez <jesus.vazquez@grafana.com>
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: Ganesh Vernekar 15064823+codesome@users.noreply.github.com
Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Dieter Plaetinck <dieter@grafana.com>
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2022-09-20 22:35:50 +05:30
Matthieu MOREL
e2ede285a2
refactor: move from io/ioutil to io and os packages (#10528)
* refactor: move from io/ioutil to io and os packages
* use fs.DirEntry instead of os.FileInfo after os.ReadDir

Signed-off-by: MOREL Matthieu <matthieu.morel@cnp.fr>
2022-04-27 11:24:36 +02:00
Mauro Stettler
b025390cb4
Disable chunk write queue by default, allow user to configure the exact size (#10425)
* Disable chunk write queue by default

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* update flag description

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>
2022-03-11 17:26:59 +01:00
Eng Zer Jun
3e67654d37
refactor: use T.TempDir() and B.TempDir to create temporary directory
The directory created by `T.TempDir()` and `B.TempDir()` is
automatically removed when the test and all its subtests complete.

Reference: https://pkg.go.dev/testing#T.TempDir
Reference: https://pkg.go.dev/testing#B.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-01-22 18:57:30 +08:00
Mauro Stettler
0df3489275
Write chunks via queue, predicting the refs (#10051)
* Write chunks via queue, predicting the refs

Our load tests have shown that there is a latency spike in the
remote write handler whenever the head chunks need to be written,
because chunkDiskMapper.WriteChunk() blocks until the chunks are written
to disk.

This adds a queue to the chunk disk mapper which makes the WriteChunk()
method non-blocking unless the queue is full. Reads can still be served
from the queue.

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* address PR feeddback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* initialize metrics without .Add(0)

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* change isRunningMtx to normal lock

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* do not re-initialize chunkrefmap

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* update metric outside of lock scope

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* add benchmark for adding job to chunk write queue

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* remove unnecessary "success" var

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* gofumpt -extra

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* avoid WithLabelValues call in addJob

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* format comments

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* addressing PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* rename cutExpectRef to cutAndExpectRef

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* use head.Init() instead of .initTime()

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* address PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* PR feedback

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* update test according to PR feedback

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* replace callbackWg -> awaitCb

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* better test of truncation with empty files

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

* replace callbackWg -> awaitCb

Signed-off-by: Mauro Stettler <mauro.stettler@gmail.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2022-01-10 13:36:45 +00:00
Dieter Plaetinck
cda025b5b5
TSDB: demistify SeriesRefs and ChunkRefs (#9536)
* TSDB: demistify seriesRefs and ChunkRefs

The TSDB package contains many types of series and chunk references,
all shrouded in uint types.  Often the same uint value may
actually mean one of different types, in non-obvious ways.

This PR aims to clarify the code and help navigating to relevant docs,
usage, etc much quicker.

Concretely:

* Use appropriately named types and document their semantics and
  relations.
* Make multiplexing and demuxing of types explicit
  (on the boundaries between concrete implementations and generic
  interfaces).
* Casting between different types should be free.  None of the changes
  should have any impact on how the code runs.

TODO: Implement BlockSeriesRef where appropriate (for a future PR)

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* agent: demistify seriesRefs and ChunkRefs

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-11-06 15:40:04 +05:30
Mateusz Gozdek
1a6c2283a3 Format Go source files using 'gofumpt -w -s -extra'
Part of #9557

Signed-off-by: Mateusz Gozdek <mgozdekof@gmail.com>
2021-11-02 19:52:34 +01:00
Dieter Plaetinck
d5afe0a577
TSDB: Use a dedicated head chunk reference type (#9501)
* Use dedicated Ref type

Throughout the code base, there are reference types masked as
regular integers.  Let's use dedicated types.  They are
equivalent, but clearer semantically.
This also makes it trivial to find where they are used,
and from uses, find the centralized docs.

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* postpone some work until after possible return

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* clarify

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* rename feedback

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>

* skip header is up to caller

Signed-off-by: Dieter Plaetinck <dieter@grafana.com>
2021-10-13 17:44:32 +05:30
Bartlomiej Plotka
8bf7bc68f1
Fixed TestChunkDiskMapper_WriteChunk_Chunk_IterateChunks for go1.16 (#8538)
Fixes https://github.com/prometheus/prometheus/issues/8403

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2021-02-25 14:38:12 +05:30
Marco Pracucci
db19e05d93
Add option to customise head chunks write buffer size (#8201)
* Add option to customise head chunks write buffer size

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-11-19 18:30:47 +05:30
Julien Pivotto
6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto
1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto
4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Ganesh Vernekar
ce4b3ac282
Simplify TestHeadReadWriter_Truncate (#7437)
* Simplify TestHeadReadWriter_Truncate

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-10-21 18:41:39 +05:30
Ganesh Vernekar
2624d827fa
Read repair empty last file in chunks_head (#8061)
* Read repair empty file in chunks_head

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Refactor and introduce repairLastChunkFile

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Attempt windows test fix

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-10-21 18:27:13 +05:30
garanews
c38816828f
fix few typo (#8023)
Signed-off-by: garanews <puntogtg@tiscali.it>
2020-10-07 16:51:31 +01:00
Jorge Vallecillo
7aa5fb01bf
tsdb/chunks/head_chunks_test.go: Fix typo (#7953)
tsdb/db_test.go: Fix typo

Signed-off-by: Jorge Vallecillo <jorgevallecilloc@gmail.com>
2020-09-20 18:42:01 +01:00
Ganesh Vernekar
c806262206
Fix 'chunks.HeadReadWriter: maxt of the files are not set' error (#7856)
* Fix chunks.HeadReadWriter: maxt of the files are not set

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-08-26 19:59:18 +02:00
Ganesh Vernekar
48fae12b89
Fix unsequential m-map files (#7414)
* Fix unsequential m-map files

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-06-18 19:24:58 +05:30
Ganesh Vernekar
a1355eb7c7
Remove time based m-map file creation (#7314)
* Remove time based m-map file creation

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-29 20:08:41 +05:30
Ganesh Vernekar
d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Ganesh Vernekar
e50fdbc70c
Live m-mapping of chunks on disk (#6830)
* Live m-mapping of chunks on disk

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments Part 2

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments Part 3

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments Part 4

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Attempt to fix windows bug

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-03-19 22:03:44 +05:30