Fixes#17255.
The implementation happens mostly in the Add and Sub method, but the reconciliation works for all relevant operations. For example, you can now `rate` over a range wherein the custom bucket boundaries are changing.
Any custom bucket reconciliation is flagged with an info-level annotation.
---------
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
While investigating +10% CPU in v3.7 release, found that ~5% is from
expanding the types map. Try reuse.
Also fix a linter error.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
So far, we emitted a `HistogramCounterResetCollisionWarning` when
encountering conflicting counter resets in the calculation of (i)rate
and friends. We even tested for that. However, in the rate
calculation, we are not interested in those collisions. They are
actually expected.
On the other hand, we did not warn about those collisions when doing a
`sum` aggregation, where such a warning would be appropriate.
This commit removes the warning in the former case and adds it in the
latter. Sadly, we cannot really test this as we still remove the
counter reset hint for the first sample in a chunk. (And that's the
only sample where we could get a `NotCounterReset` hint.)
Signed-off-by: beorn7 <beorn@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
* fix(promql): histogram_quantile NaN observed in native histogram
Fixes: #16578
See the issue for detailed explanation.
When a histogram had only NaN observations and no normal observations,
we returned 0 from the quantile, which is completely wrong. If there were
normal observations but we went over them, we returned the upper bound of
the existing buckets, however that contradicts expectations on
histogram_fraction. Now we return NaN if the quantile is calculated to be
over all normal observations, falling into NaNs (in a virtual +Inf bucket).
We also return info level annotations if we see any NaN observations.
The annotation calls out if we returned NaN or even if we took the
virtual +Inf bucket into account.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* fix(promql): histogram_fraction NaN observed in native histogram
Fixes: #16580
According to the specification we should not take NaN observations
into account when calculating the fraction. This commit fixes that
and adds an info level annotation to let the user know about this.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Provide PromQL info annotations when rate()/increase() over series without counter label
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Address comments
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Bump prometheus/common to v0.63.0
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* nolint usage of deprecated model.NameValidationScheme
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* util/httputil: Benchmark newCompressedResponseWriter
This benchmark illustrates that newCompressedResponseWriter incurs a
prohibitive amount of heap allocations when handling a request containing a
malicious Accept-Encoding header.¬
Signed-off-by: jub0bs <jcretel-infosec+github@protonmail.com>
* util/httputil: Improve newCompressedResponseWriter
This change dramatically reduces the heap allocations (in bytes)
incurred when handling a request containing a malicious Accept-Encoding header.
Below are some benchmark results; for conciseness, I've omitted the name of the
benchmark function (BenchmarkNewCompressionHandler_MaliciousAcceptEncoding):
```
goos: darwin
goarch: amd64
pkg: github.com/prometheus/prometheus/util/httputil
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
│ old │ new │
│ sec/op │ sec/op vs base │
18.60m ± 2% 13.54m ± 3% -27.17% (p=0.000 n=10)
│ old │ new │
│ B/op │ B/op vs base │
16785442.50 ± 0% 32.00 ± 0% -100.00% (p=0.000 n=10)
│ old │ new │
│ allocs/op │ allocs/op vs base │
2.000 ± 0% 1.000 ± 0% -50.00% (p=0.000 n=10)
```
Signed-off-by: jub0bs <jcretel-infosec+github@protonmail.com>
---------
Signed-off-by: jub0bs <jcretel-infosec+github@protonmail.com>
Resolves: #15559
As accurately noted in the issue description, the map is shared among
child loggers that get created when `WithAttr()`/`WithGroup()` are
called on the underlying handler, which happens via `log.With()` and
`log.WithGroup()` respectively.
The RW mutex was a value in the previous implementation that used
go-kit/log, and I should've updated it to use a pointer when I converted
the deduper.
Also adds a test.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Improves upon #15434, better resolves#15433.
This commit introduces a few changes to ensure safer handling of the
JSONFileLogger:
- the JSONFileLogger struct now implements the slog.Handler interface,
so it can directly be used to create slog Loggers. This pattern more
closely aligns with upstream slog usage (which is generally based around
handlers), as well as making it clear that devs are creating a whole new
logger when interacting with it (vs silently modifying internal configs
like it did previously).
- updates the `promql.QueryLogger` interface to be a union of the
methods of both the `io.Closer` interface and the `slog.Handler`
interface. This allows for plugging in/using slog-compatible loggers
other than the JSONFileLogger, if desired (ie, for downstream projects).
- introduces new `scrape.FailureLogger` interface; just like
`promql.QueryLogger`, it is a union of `io.Closer` and `slog.Handler`
interfaces. Similar logic applies to reasoning.
- updates tests where needed; have the `FakeQueryLogger` from promql's
engine_test implement the `slog.Handler`, improve JSONFileLogger test
suite, etc.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Reduce string manipulation by just cutting off the histogram suffixes from
the series name label once.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Resolves: #15433
When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.
This commit does a few things:
- It was referenced in feedback in #14906 that it would've been better
to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
methods in the contract.
- updates tests accordingly
- cleans up unused methods
Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
PromQL: Correct the behaviour of some operator and aggregators with Native Histograms
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
In general aim for the happy case when the exposer lists the buckets
in ascending order.
Use Compact(2) to compact the result of nhcb convert.
This is more in line with how client_golang optimizes spans vs
buckets.
aef8aedb4b/prometheus/histogram.go (L1485)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
I used these wrapper methods during initial development of the custom
handler that the deduper now implements. Since the deduper implements
slog.Handler and can be used directly as a logger, these wrapper methods
are no longer needed.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>