281 Commits

Author SHA1 Message Date
Marcelo E. Magallon
75d86c6747 Update golangci-lint to 1.36.0
In the previous version, 1.18.0, the "megacheck" linter paid attention
to the '//lint:ignore' comment, but that is no longer there.

Newer version pay attention to '//nolint:<linter>,<linter>,...'
comments, optionally followed by a "second" comment introduced by '//'.

Update the directives to use this style.

This is related to prometheus/blackbox_exporter#738 and
prometheus/blackbox_exporter#745.

Signed-off-by: Marcelo E. Magallon <marcelo.magallon@grafana.com>
2021-02-04 08:53:33 -06:00
Marco Pracucci
d8c17025df
Fix TSDB head struct dump on querier error (#8379)
* Fix TSDB head struct dump on querier error

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added mint/maxt to RangeHead.String()

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2021-01-21 16:07:29 +05:30
johncming
a6e18916ab
tsdb: Remove duplicate variables. (#8239)
Signed-off-by: johncming <johncming@yahoo.com>
2020-11-30 08:55:33 +00:00
Ganesh Vernekar
dff967286e
Set the min time of Head properly after truncation (#8212)
* Set the min time of Head properly after truncation

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix lint

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Enhance compaction plan logic for completely deleted small block

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-11-25 18:33:30 +05:30
Marco Pracucci
db19e05d93
Add option to customise head chunks write buffer size (#8201)
* Add option to customise head chunks write buffer size

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2020-11-19 18:30:47 +05:30
Julien Pivotto
8bc369bf9b
Calculate head chunk size based on actual disk usage (#8139)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-11-03 15:34:59 +05:30
Bartlomiej Plotka
3d8826a3d4
MultiError: Refactored MultiError for more concise and safe usage. (#8066)
* MultiError: Refactored MultiError for more concise and safe usage.

* Less lines
* Goland IDE was marking every usage of old MultiError "potential nil" error
* It was easy to forgot using Err() when error was returned, now it's safely assured on compile time.

NOTE: Potentially I would rename package to merrors. (: In different PR.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed review comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fix after rebase.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-10-28 15:24:58 +00:00
Julien Pivotto
4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Bartlomiej Plotka
2fe1e9fa93
Create a checkpoint only at the end of Compact call (#8067)
* Create a checkpoint only at the end of Compact call

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix review comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix Bartek's offline reviews

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Introduce TruncateInMemory and TruncateWAL

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Small enhancements and test fixing attempts

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add TestOneCheckpointPerCompactCall

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Don't truncate WAL on block compaction

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Simplified the algo.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Better protection around calling truncateWAL, truncate WAL on Head compaction error

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-10-19 20:57:08 +05:30
Ling Jin
9145200842
tsdb: fix unkown ref in log (#8048)
Signed-off-by: JinLingChristopher <jinl1037@hotmail.com>
2020-10-13 20:03:16 +05:30
Arthur Silva Sens
4f45e201cc
Promtool tsdb list now prints block sizes (#7993)
* promtool tsdb list now prints blocks' size

Signed-off-by: arthursens <arthursens2005@gmail.com>
2020-10-12 23:15:40 +02:00
garanews
c38816828f
fix few typo (#8023)
Signed-off-by: garanews <puntogtg@tiscali.it>
2020-10-07 16:51:31 +01:00
Brian Brazil
073e93c768
Gracefully handle unknown WAL record types. (#8004)
As we're looking to expand what's in the WAL,
having old Prometheus servers ignore the new record types
rather than treating them as corruption allows for better
upgrade/downgrade paths.

Adjust some tests accordingly, so they're still testing what they're
meant to test.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-10-05 10:09:59 +01:00
Max Neverov
7e1c27b853
Add tsdb startup duration metric (#7737)
* Add tsdb wal replay duration metric

Signed-off-by: Max Neverov <neverov.max@gmail.com>
2020-09-21 18:25:05 +02:00
Xiaochao Dong
a282d25099
tsdb: remove duplicate values set to reduce memory usage(map overhead) (#7915)
Signed-off-by: Xiaochao Dong (@damnever) <dxc.wolf@gmail.com>
2020-09-10 20:35:47 +05:30
johncming
75ae384192
tsdb: remove redundant fields. (#7869)
Signed-off-by: johncming <johncming@yahoo.com>
2020-09-02 17:03:21 +01:00
Ganesh Vernekar
2255b6f62f
Refactor WAL.Segments method to be part of the wal package (#6477)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-09-01 14:46:57 +05:30
Ganesh Vernekar
c806262206
Fix 'chunks.HeadReadWriter: maxt of the files are not set' error (#7856)
* Fix chunks.HeadReadWriter: maxt of the files are not set

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-08-26 19:59:18 +02:00
Yukun Sun
cfd4e05c9e
fix: return a corruption error when iterator function find a chunk that is out of sequence (#7855)
Signed-off-by: sunyukun <sunyukun@didiglobal.com>

Co-authored-by: sunyukun <sunyukun@didiglobal.com>
2020-08-26 20:36:27 +05:30
Zhou Hao
40ace418d1
fix misspell (#7764)
Signed-off-by: Zhou Hao <zhouhao@cn.fujitsu.com>
2020-08-07 08:57:25 +01:00
johncming
ac677ed8b3
promql: delete redundant return value. (#7721)
Signed-off-by: johncming <johncming@yahoo.com>
2020-08-03 10:45:53 +01:00
Bartlomiej Plotka
e6d7cc5fa4
tsdb: Added ChunkQueryable implementations to db; unified MergeSeriesSets and vertical to single struct. (#7069)
* tsdb: Added ChunkQueryable implementations to db; unified compactor, querier and fanout block iterating.

Chained to https://github.com/prometheus/prometheus/pull/7059

* NewMerge(Chunk)Querier now takies multiple primaries allowing tsdb DB code to use it.
* Added single SeriesEntry / ChunkEntry for all series implementations.
* Unified all vertical, and non vertical for compact and querying to single
merge series / chunk sets by reusing VerticalSeriesMergeFunc for overlapping algorithm (same logic as before)
* Added block (Base/Chunk/)Querier for block querying. We then use populateAndTomb(Base/Chunk/) to iterate over chunks or samples.
* Refactored endpoint tests and querier tests to include subtests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments from Brian and Beorn.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed snapshot test and added chunk iterator support for DBReadOnly.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed race when iterating over Ats first.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed populate block tests.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed endpoints test.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed test.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added test & fixed case of head open chunk.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed DBReadOnly tests and bug producing 1 sample chunks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added cases for partial block overlap for multiple full chunks.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Added extra tests for chunk meta after compaction.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Fixed small vertical merge bug and added more tests for that.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-07-31 16:03:02 +01:00
Annanay
9bba8a6eae Merge branch 'master' into appender-context
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:43:18 +05:30
Annanay
89129cd39a Address comments
Signed-off-by: Annanay <annanayagarwal@gmail.com>
2020-07-30 16:41:13 +05:30
Javier Palomo Almena
348ff4285f
tsdb: Replace sync/atomic with uber-go/atomic in tsdb (#7659)
* tsdb/chunks: Replace sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb/heaad: Replace sync/atomic with uber-go/atomic

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* vendor: Make go.uber.org/atomic a direct dependency

There is no modifications to go.sum and vendor/ because
it was already vendored.

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>

* tsdb: Remove comments referring to the sync/atomic alignment bug

Related: https://golang.org/pkg/sync/atomic/#pkg-note-BUG

Signed-off-by: Javier Palomo <javier.palomo.almena@gmail.com>
2020-07-28 10:12:42 +05:30
Julien Pivotto
ffc925dd21
TSDB: Error when we commit/rollback twice (#7593)
* TSDB: Error when we commit/rollback twice

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 11:57:38 +02:00
Krasimir Georgiev
ccab2b30c9
Test no panic after a WAL corruption (#7625)
* no panic the head memseries has chunks in it

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

* fix a panic when querying after a wal corruption.

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

* review nits

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

* Add test for reading the data after a wal corruption.

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Update tsdb/db_test.go

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>

Update tsdb/db_test.go

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

* spellings

Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>

Co-authored-by: Ganesh Vernekar <15064823+codesome@users.noreply.github.com>
2020-07-21 12:32:13 +05:30
Krasi Georgiev
d30492cbb0 Avoid panic when the headChunk is nil during isolation.
Signed-off-by: Krasi Georgiev <8903888+krasi-georgiev@users.noreply.github.com>
2020-07-20 18:23:18 +03:00
Ganesh Vernekar
1760c7474c
Replay m-map chunks irrespective of WAL (#7589)
* Replay m-map chunks irrespective of WAL

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* More logs

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-07-16 18:34:08 +05:30
Ganesh Vernekar
ea013343ca
Log when starting to create a checkpoint (#7581)
Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-07-15 19:15:37 +05:30
Bartlomiej Plotka
823b218e1b
Fixed race between compact (gc, populate) and head append causing unknown symbol error. (#7560)
* Fixed race between compact (gc, populate) and head append causing unknown symbol error.

Fixes https://github.com/prometheus/prometheus/issues/7373

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-07-14 09:36:22 +01:00
Bartlomiej Plotka
492061b24c
Revert "Fix unknown symbol error during head compaction (#7526)" (#7556)
This reverts commit 30505a202a4c33ceeb10bfb3ba01e371a2a10906.
2020-07-11 22:37:16 +05:30
Ganesh Vernekar
30505a202a
Fix unknown symbol error during head compaction (#7526)
* Fix race during head compaction

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Comment out the test

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Skip test instead of commenting it out

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-07-07 17:29:09 +05:30
Harkishen Singh
f32307b656
Increments WAL corruption metric on WAL corruption during checkpointing (#7491)
* Increments wal corruption metric on error during checkpointing

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>

* check for wal corruption error

Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2020-07-05 11:25:42 +05:30
Ganesh Vernekar
082c17b691
Introduce SortedLabelValues/LabelValues to speedup queries for high cardinality (#7448)
* Introduce LabelValuesUnsorted to speedup queries for high cardinality

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add sort check

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-06-25 14:10:29 +01:00
Peter Štibraný
ff80690a6e
Optimise lowWatermark in Isolation (#7332)
* Track open appenders in doubly-linked list to make lowWatermark O(1).
* Use RW locks.
* Added BenchmarkIsolationWithState.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
2020-06-03 20:09:05 +02:00
Jess G
fdc49fae5b
Added time range parameters to labelNames API (#7288)
* add time range params to labelNames api

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* evaluate min/max time range when reading labels from the head

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* add time range params to labelValues api

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* fix test, add docs

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* add a test for head min max range

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* fix test to match comment

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* address CR comments

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* combine vars only used once

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* add time range params to labelNames api

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* evaluate min/max time range when reading labels from the head

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* add time range params to labelValues api

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* fix test, add docs

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* add a test for head min max range

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* fix test to match comment

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* address CR comments

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* combine vars only used once

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* fix test

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* restart ci

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>

* use range expectedLabelNames instead of range actualLabelNames in test

Signed-off-by: jessicagreben <Jessica.greben1+github@gmail.com>
2020-05-30 13:50:09 +01:00
Krasimir Georgiev
09df8d94e0
More explicit chunks and head error handling. (#7277) 2020-05-22 12:03:23 +03:00
Ganesh Vernekar
1c99adb9fd
Callbacks for lifecycle of series in TSDB (#7159)
* Callbacks for lifecycle of series in TSDB

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Add more comments

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-20 18:52:08 +05:30
Ganesh Vernekar
d4b9fe801f
M-map full chunks of Head from disk (#6679)
When appending to the head and a chunk is full it is flushed to the disk and m-mapped (memory mapped) to free up memory

Prom startup now happens in these stages
 - Iterate the m-maped chunks from disk and keep a map of series reference to its slice of mmapped chunks.
- Iterate the WAL as usual. Whenever we create a new series, look for it's mmapped chunks in the map created before and add it to that series.

If a head chunk is corrupted the currpted one and all chunks after that are deleted and the data after the corruption is recovered from the existing WAL which means that a corruption in m-mapped files results in NO data loss.

[Mmaped chunks format](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/head_chunks.md)  - main difference is that the chunk for mmaping now also includes series reference because there is no index for mapping series to chunks.
[The block chunks](https://github.com/prometheus/prometheus/blob/master/tsdb/docs/format/chunks.md) are accessed from the index which includes the offsets for the chunks in the chunks file - example - chunks of series ID have offsets 200, 500 etc in the chunk files.
In case of mmaped chunks, the offsets are stored in memory and accessed from that. During WAL replay, these offsets are restored by iterating all m-mapped chunks as stated above by matching the series id present in the chunk header and offset of that chunk in that file.

**Prombench results**

_WAL Replay_

1h Wal reply time
30% less wal reply time - 4m31 vs 3m36
2h Wal reply time
20% less wal reply time - 8m16 vs 7m

_Memory During WAL Replay_

High Churn:
10-15% less RAM -  32gb vs 28gb
20% less RAM after compaction 34gb vs 27gb
No Churn:
20-30% less RAM -  23gb vs 18gb
40% less RAM after compaction 32.5gb vs 20gb

Screenshots are in [this comment](https://github.com/prometheus/prometheus/pull/6679#issuecomment-621678932)


Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-05-06 21:00:00 +05:30
Ben Ye
1e4e37144d
Fixed wrongly handled not ready TSDB on web and API. (#7182)
* fix federate endpoint panic

Signed-off-by: yeya24 <yb532204897@gmail.com>

* Fixed all cases of not ready TSDB being wrongly handled.

* Fixed issue for federation.
* Ensured this will never happen again thanks to interfaces
* Fixes same issue for stats.
* Added tests for readiness.
* Fixed bug in stats. It was:
   status.MaxTime = db.Head().MaxTime()
   status.MinTime = db.Head().MaxTime()


Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* Addressed Brian's comments.

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2020-04-29 17:16:14 +01:00
Julien Pivotto
fc3fb3265a
Merge pull request #7145 from prometheus/release-2.17
Backport release 2.17 into master
2020-04-20 14:08:12 +02:00
Julien Pivotto
ed1852ab95
TSDB: Isolation: avoid creating appenderId's without appender (#7135)
Prior to this commit we could have situations where we are creating an
appenderId but never creating an appender to go with it, therefore
blocking the low watermak.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-04-17 20:51:03 +02:00
Marek Slabicki
8224ddec23
Capitalizing first letter of all log lines (#7043)
Signed-off-by: Marek Slabicki <thaniri@gmail.com>
2020-04-11 09:22:18 +01:00
Brian Brazil
cd73b3d33e
Reduce how much old WAL we keep around. (#7098)
Previously we were keeping up to around 6 hours of WAL around by
removing 1/3 every hours. This was excessive, so switch to removing 2/3
which will up to around 3 hours of WAL around.

This will roughly halve the size of the WAL and halve startup time for
those who are I/O bound. This may increase the checkpoint size for
those with certain churn patterns, but by much less than we're saving
from the segments.

Signed-off-by: Brian Brazil <brian.brazil@robustperception.io>
2020-04-07 15:55:57 +05:30
Julien Pivotto
ceef10cee4 Reset comment
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-26 00:17:56 +01:00
Julien Pivotto
653f343547 Revert head posting optimization
This reverts commit 52630ad0c735f2dce4ce5bb851acb6c5d7df5eb1.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-25 20:19:33 +01:00
Julien Pivotto
8907ba6235 Make TSDB use storage errors
This fixes #6992, which was introduced by #6777. There was an
intermediate component which translated TSDB errors into storage errors,
but that component was deleted and this bug went unnoticed, until we
were watching at the Prombench results. Without this, scrape will fail
instead of dropping samples or using "Add" when the series have been
garbage collected.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-17 22:24:25 +01:00
Julien Pivotto
0f9e78bd88
tsdb: fix races around head chunks (#6985)
* tsdb: fix races around head chunks

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-03-16 13:59:22 +01:00
Björn Rabenstein
d80b0810c1
Move crucial actions to defer (#6918)
With defer having less of a performance penalty, there is no reason
not to do those crucial operations via defer.

Context: With isolation in place, if we forget to Commit/Rollback, the
low watermark will get stuck forever.

The current code should not have any bugs, but moving to defer helps
to avoid future bugs.

This is also moving the `closeAppend` in the `Commit` implementation
itself to defer. If logging to the WAL fails, we would have missed the
`closeAppend`.

Signed-off-by: beorn7 <beorn@grafana.com>
2020-03-13 20:54:47 +01:00