For tests only, we had various ways of opening DB. Reduced to one
instead of:
* Open
* newTestDB
* newTestDBOpts
* openTestDB
This so https://github.com/prometheus/prometheus/pull/17629 is smaller
and bit easier. Also for test maintainability and consistency.
Signed-off-by: bwplotka <bwplotka@gmail.com>
* Delay compactions until Thanos uploads all blocks
Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.
The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Rename structs and functions to make this more generic
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Address review comments
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Cache UploadMeta for 1 minute
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
---------
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.
Furthermore, invalid negative offsets might cause problems, too.
Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)
In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.
Signed-off-by: beorn7 <beorn@grafana.com>
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.
```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st
```
Signed-off-by: bwplotka <bwplotka@gmail.com>
It is fairly common that the next item is the one we want, and cheap
to check.
We could also start the binary search one position on, but strangely
that slows it down.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Reduce tsdb.TestHeadSeriesChunkRace number of iterations from 1000 to
100, to stop this test from timing out under CI.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
This callback is not used by Prometheus, but in downstream projects it
is wasteful to allocate an ID only to abandon it.
Remove lengthy commment which I feel is distracting from the flow.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Refactor the code so that everything proceeds linearly.
Also renamed `getOrSet` to `setUnlessAlreadySet` to emphasise that the
caller is expecting it not to be set.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
* fix: Fix slicelabels corruption when used with proto decoding
Alternative to https://github.com/prometheus/prometheus/pull/16957/
Signed-off-by: bwplotka <bwplotka@gmail.com>
* addressed comments
Signed-off-by: bwplotka <bwplotka@gmail.com>
---------
Signed-off-by: bwplotka <bwplotka@gmail.com>
While investigating +10% CPU in v3.7 release, found that ~5% is from
expanding the types map. Try reuse.
Also fix a linter error.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
While investigating +10% CPU in v3.7 release, found that ~5% is from
expanding the types map. Try reuse.
Also fix a linter error.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Reduce the resolution of histograms as needed and ignore invalid
schemas while emitting a warning log.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if
above.
The reduce code path will be slow, but we only expect it to happen if
TSDB already has higher resolution samples and we are in a rollback.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
# Conflicts:
# model/histogram/generic.go
Otherwise higher level code like PromQL needs to constantly check if it
can handle the samples.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
With the fixed commit order, we can now handle the conversion of float
staleness markers to histogram staleness markers in a more direct way.
Signed-off-by: beorn7 <beorn@grafana.com>
Fixes https://github.com/prometheus/prometheus/issues/15177
The basic idea here is to divide the samples to be commited into (sub)
batches whenever we detect that the same series receives a sample of a
type different from the previous one. We then commit those batches one
after another, and we log them to the WAL one after another, so that
we hit both birds with the same stone. The cost of the stone is that
we have to track the sample type of each series in a map. Given the
amount of things we already track in the appender, I hope that it
won't make a dent. Note that this even addresses the NHCB special case
in the WAL.
This does a few other things that I could not resist to pick up on the
go:
- It adds more zeropool.Pools and uses the existing ones more
consistently. My understanding is that this was merely an oversight.
Maybe the additional pool usage will compensate for the increased
memory demand of the map.
- Create the synthetic zero sample for histograms a bit more
carefully. So far, we created a sample that always went into its own
chunk. Now we create a sample that is compatible enough with the
following sample to go into the same chunk. This changed the test
results quite a bit. But IMHO it makes much more sense now.
- Continuing past efforts, I changed more namings of `Samples` into
`Floats` to keep things consistent and less confusing. (Histogram
samples are also samples.) I still avoided changing names in other
packages.
- I added a few shortcuts `h := a.head`, saving many characters.
TODOs:
- Address @krajorama's TODOs about commit order and staleness handling.
Signed-off-by: beorn7 <beorn@grafana.com>