conf.StorageConfig.TSDBConfig.Retention.Time is model.Duration which is
type-aliased to time.Duration (nanoseconds), but RetentionDuration is int64
in milliseconds. The missing division by time.Millisecond caused the metric
prometheus_tsdb_retention_limit_seconds to be reported 1e6 times too large
after a config reload.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Replace polling loops (for range 100 { time.Sleep }) with explicit
db.Compact() calls after disabling background compaction, eliminating
CI flakiness on slow machines. Also fix incorrect overlap assertions
that were checking the wrong direction (LessOrEqual -> GreaterOrEqual).
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Initial implementation of https://github.com/prometheus/prometheus/issues/17790.
Only implements ST-per-sample for Counters. Tests and benchmarks updated.
Note: This increases the size of the RefSample object for all users, whether st-per-sample is turned on or not.
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Guard the stale series ratio calculation by checking numSeries > 0
before computing the ratio. This prevents division by zero when
the head has no series.
Fixes#17949
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
find . -name "*.go" -type f -exec sed -E -i \
's/([^[:alpha:]]sample\{)([^,{:]+,[^,]+,[^,]+,[^,]+\})/\10, \2/g' {} +
I've omitted tsdb/ooo_head.go from the commit because I'm also adding todo
there.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
For tests only, we had various ways of opening DB. Reduced to one
instead of:
* Open
* newTestDB
* newTestDBOpts
* openTestDB
This so https://github.com/prometheus/prometheus/pull/17629 is smaller
and bit easier. Also for test maintainability and consistency.
Signed-off-by: bwplotka <bwplotka@gmail.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
Reduce the resolution of histograms as needed and ignore invalid
schemas while emitting a warning log.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Fixes https://github.com/prometheus/prometheus/issues/15177
The basic idea here is to divide the samples to be commited into (sub)
batches whenever we detect that the same series receives a sample of a
type different from the previous one. We then commit those batches one
after another, and we log them to the WAL one after another, so that
we hit both birds with the same stone. The cost of the stone is that
we have to track the sample type of each series in a map. Given the
amount of things we already track in the appender, I hope that it
won't make a dent. Note that this even addresses the NHCB special case
in the WAL.
This does a few other things that I could not resist to pick up on the
go:
- It adds more zeropool.Pools and uses the existing ones more
consistently. My understanding is that this was merely an oversight.
Maybe the additional pool usage will compensate for the increased
memory demand of the map.
- Create the synthetic zero sample for histograms a bit more
carefully. So far, we created a sample that always went into its own
chunk. Now we create a sample that is compatible enough with the
following sample to go into the same chunk. This changed the test
results quite a bit. But IMHO it makes much more sense now.
- Continuing past efforts, I changed more namings of `Samples` into
`Floats` to keep things consistent and less confusing. (Histogram
samples are also samples.) I still avoided changing names in other
packages.
- I added a few shortcuts `h := a.head`, saving many characters.
TODOs:
- Address @krajorama's TODOs about commit order and staleness handling.
Signed-off-by: beorn7 <beorn@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
* refactor: simplify error handling and remove redundant checks
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
* Add the comment for return of reloading blocks failure
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
* Add the comment for return of reloading blocks failure
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
---------
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
* Remove experimental out-of-order native histogram flag
This feature has been available in Prometheus since September 2024,
and has no known issues. Therefore proposing to remove the flag
entirely and always have it on. Note that there are still two
settings that need to be configured (out-of-order time window > 0
and native histograms enabled) for this feature to work.
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update CHANGELOG
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Keep feature flag with warning
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update CHANGELOG
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update tsdb/head_append.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Update CHANGELOG.md
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Update tsdb/head_append.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Additional cleanup of comments and test names
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
---------
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Renames the head's deleted map to walExpiries, and creates entries for any
duplicate series records encountered during WAL replay, with the expiry set
to the highest current WAL segment number. Any subsequent WAL
checkpoints will see the duplicate series entry in the walExpiries map, and
keep the series record until the last WAL segment that could contain its
samples is deleted.
Other considerations:
WBL: series records aren't written to the WBL, so there are no duplicates to deal with
agent mode: has its own WAL replay logic that handles duplicate series records differently, and is outside the scope of this PR
'defer' only runs at the end of the function, so introduce some more
functions / move the start, so that 'defer' can run at the end of the
logical block.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This test ensures that running db.reloadBlocks() and db.CleanTombstones() at the same time doesn't race.
The problem is that CleanTombstones() is a public method while reloadBlocks() is internal.
CleanTombstones() sets db.cmtx lock while reloadBlocks() is not protected by any locks at all, it expects the public method through which it was called to do it.
So having a race between these two is not unexpected and we shouldn't really be testing this.
db.cmtx ensures that no other function can be modifying the list of open blocks and so the scenario tested here cannot happen.
If it would happen it would be only because some other method doesn't aquire db.ctmx lock, something this test cannot detect.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>