ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.
Furthermore, invalid negative offsets might cause problems, too.
Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)
In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.
Signed-off-by: beorn7 <beorn@grafana.com>
Currently, iterating over histogram buckets can panic if the spans are
not consistent with the buckets. We aim for validating histograms upon
ingestion, but there might still be data corruptions on disk that
could trigger the panic. While data corruption on disk is really bad
and will lead to all kind of weirdness, we should still avoid
panic'ing.
Note, though, that chunks are secured by checksums, so the corruptions
won't realistically happen because of disk faults, but more likely
because a chunk was generated in a faulty way in the first place, by
a software bug or even maliciously.
This commit prevents panics in the situation where there are fewer
buckets than described by the spans. Note that the missing buckets
will simply not be iterated over. There is no signalling of this
problem. We might still consider this separately, but for now, I would
say that this kind of corruption is exceedingly rare and doesn't
deserve special treatment (which will add a whole lot of complexity to
the code).
Signed-off-by: beorn7 <beorn@grafana.com>
`histogram.Error` becomes the generic wrapper type for all histogram errors.
This makes it easier and less error prone when adding new errors to check if
an error is an histogram error as well as making it less error prone to convert
the errors.
This change the type of those specific sentinel errors from error to
`histogram.Error`, but it should almost never matter.
e.g., `errors.Is(err, ErrHistogram...)` would still work out of the box.
Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
Fixes#17255.
The implementation happens mostly in the Add and Sub method, but the reconciliation works for all relevant operations. For example, you can now `rate` over a range wherein the custom bucket boundaries are changing.
Any custom bucket reconciliation is flagged with an info-level annotation.
---------
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
Fixes#17308.
As explained adding the warn-annotation about conflicting counter
reset hints doesn't happen consistently. Furthermore, because of
incremental mean calculation being used so far (which includes
subtraction), avg calculation always created gauge histograms.
The fix is to make Sub behave like Add WRT counter reset handling, and
then set the result of a subtraction to gauge explicitly in actual
PromQL subtraction (rather than using Sub for something else, like
incremental mean calculation). Also, track the presence of a
CounterReset hint and a NotCounterReset hint separately for the
entirety of aggregated histograms and create the warn-annotation based
on that.
As a minor fix, this commit also consistently creates the warn
annotation in aggregation to be about "aggregation" rather than
"subtraction" or "addition", because the latter are just internal
operations within the aggregation, which is not of interest for the
user.
Signed-off-by: beorn7 <beorn@grafana.com>
We have always validated that none of the bucket is negative. We
should do the same for the count of observations and the zero bucket.
Note that this was always implied in the protobuf exposition format
because a count or a zero bucket population is ignored if it is not
positive.
Signed-off-by: beorn7 <beorn@grafana.com>
Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if
above.
The reduce code path will be slow, but we only expect it to happen if
TSDB already has higher resolution samples and we are in a rollback.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
# Conflicts:
# model/histogram/generic.go
Otherwise higher level code like PromQL needs to constantly check if it
can handle the samples.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Histogram.Validate and FloatHistogram.Validate now return error on
unsupported schemas.
Scrape and remote-write handler reduces the schema to the maximum allowed
if it is above the maximum, but below theoretical maximum of 52.
For scrape the maximum is a configuration option, for remote-write it is 8.
Note: OTLP endpont already does the reduction, without checking that it is
below 52 as the spec does not specify a maximum.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
The custom values are the "le" bucket boundaries of native histograms
with custom buckets. They are never modified. It is ok to not copy them
when iterating a chunk, just reference them.
If we will ever have a function that modifies the custom values, like
'trim' for example. That function will have to make a copy on write.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
promql: corrects binary operators functioning for mixed sample with histogram and float
For invalid pairings of sample types, an annotation is added now.
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>