Fixes https://github.com/prometheus/prometheus/issues/15177
The basic idea here is to divide the samples to be commited into (sub)
batches whenever we detect that the same series receives a sample of a
type different from the previous one. We then commit those batches one
after another, and we log them to the WAL one after another, so that
we hit both birds with the same stone. The cost of the stone is that
we have to track the sample type of each series in a map. Given the
amount of things we already track in the appender, I hope that it
won't make a dent. Note that this even addresses the NHCB special case
in the WAL.
This does a few other things that I could not resist to pick up on the
go:
- It adds more zeropool.Pools and uses the existing ones more
consistently. My understanding is that this was merely an oversight.
Maybe the additional pool usage will compensate for the increased
memory demand of the map.
- Create the synthetic zero sample for histograms a bit more
carefully. So far, we created a sample that always went into its own
chunk. Now we create a sample that is compatible enough with the
following sample to go into the same chunk. This changed the test
results quite a bit. But IMHO it makes much more sense now.
- Continuing past efforts, I changed more namings of `Samples` into
`Floats` to keep things consistent and less confusing. (Histogram
samples are also samples.) I still avoided changing names in other
packages.
- I added a few shortcuts `h := a.head`, saving many characters.
TODOs:
- Address @krajorama's TODOs about commit order and staleness handling.
Signed-off-by: beorn7 <beorn@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
This removes the stringlabels build tag, makes that implementation the default one, and moves the old labels implementation under the slicelabels build tag.
Fixes#16064.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This test consistently fails missing ~10 series.
If it doesn't fail on your machine, just increase totalSeries, that's
how race conditions work.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Remove experimental out-of-order native histogram flag
This feature has been available in Prometheus since September 2024,
and has no known issues. Therefore proposing to remove the flag
entirely and always have it on. Note that there are still two
settings that need to be configured (out-of-order time window > 0
and native histograms enabled) for this feature to work.
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update CHANGELOG
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Keep feature flag with warning
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update CHANGELOG
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update tsdb/head_append.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Update CHANGELOG.md
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Update tsdb/head_append.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Additional cleanup of comments and test names
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
---------
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Renames the head's deleted map to walExpiries, and creates entries for any
duplicate series records encountered during WAL replay, with the expiry set
to the highest current WAL segment number. Any subsequent WAL
checkpoints will see the duplicate series entry in the walExpiries map, and
keep the series record until the last WAL segment that could contain its
samples is deleted.
Other considerations:
WBL: series records aren't written to the WBL, so there are no duplicates to deal with
agent mode: has its own WAL replay logic that handles duplicate series records differently, and is outside the scope of this PR
'defer' only runs at the end of the function, so explicitly close the
querier after we finish with it. Also check it didn't error.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This enables it to take advantage of a more compact data structure
since all postings are known to be `*ListPostings`.
Remove the `Get` member which was not used for anything else, and fix up
tests.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Always return unknown hint for first sample in non-gauge histogram chunk
---------
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
When handling recoded histogram chunks the min time of the chunk is
updated by mistake. It should only update when the chunk is completely
new.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Two Appenders race when creating a series with a native histogram
as the memSeries will be common and the lastHistogram field is written
without lock.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>