So far, we emitted a `HistogramCounterResetCollisionWarning` when
encountering conflicting counter resets in the calculation of (i)rate
and friends. We even tested for that. However, in the rate
calculation, we are not interested in those collisions. They are
actually expected.
On the other hand, we did not warn about those collisions when doing a
`sum` aggregation, where such a warning would be appropriate.
This commit removes the warning in the former case and adds it in the
latter. Sadly, we cannot really test this as we still remove the
counter reset hint for the first sample in a chunk. (And that's the
only sample where we could get a `NotCounterReset` hint.)
Signed-off-by: beorn7 <beorn@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
* fix(promql): histogram_quantile NaN observed in native histogram
Fixes: #16578
See the issue for detailed explanation.
When a histogram had only NaN observations and no normal observations,
we returned 0 from the quantile, which is completely wrong. If there were
normal observations but we went over them, we returned the upper bound of
the existing buckets, however that contradicts expectations on
histogram_fraction. Now we return NaN if the quantile is calculated to be
over all normal observations, falling into NaNs (in a virtual +Inf bucket).
We also return info level annotations if we see any NaN observations.
The annotation calls out if we returned NaN or even if we took the
virtual +Inf bucket into account.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* fix(promql): histogram_fraction NaN observed in native histogram
Fixes: #16580
According to the specification we should not take NaN observations
into account when calculating the fraction. This commit fixes that
and adds an info level annotation to let the user know about this.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Provide PromQL info annotations when rate()/increase() over series without counter label
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Address comments
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Bump prometheus/common to v0.63.0
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* nolint usage of deprecated model.NameValidationScheme
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* util/httputil: Benchmark newCompressedResponseWriter
This benchmark illustrates that newCompressedResponseWriter incurs a
prohibitive amount of heap allocations when handling a request containing a
malicious Accept-Encoding header.¬
Signed-off-by: jub0bs <jcretel-infosec+github@protonmail.com>
* util/httputil: Improve newCompressedResponseWriter
This change dramatically reduces the heap allocations (in bytes)
incurred when handling a request containing a malicious Accept-Encoding header.
Below are some benchmark results; for conciseness, I've omitted the name of the
benchmark function (BenchmarkNewCompressionHandler_MaliciousAcceptEncoding):
```
goos: darwin
goarch: amd64
pkg: github.com/prometheus/prometheus/util/httputil
cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
│ old │ new │
│ sec/op │ sec/op vs base │
18.60m ± 2% 13.54m ± 3% -27.17% (p=0.000 n=10)
│ old │ new │
│ B/op │ B/op vs base │
16785442.50 ± 0% 32.00 ± 0% -100.00% (p=0.000 n=10)
│ old │ new │
│ allocs/op │ allocs/op vs base │
2.000 ± 0% 1.000 ± 0% -50.00% (p=0.000 n=10)
```
Signed-off-by: jub0bs <jcretel-infosec+github@protonmail.com>
---------
Signed-off-by: jub0bs <jcretel-infosec+github@protonmail.com>
Resolves: #15559
As accurately noted in the issue description, the map is shared among
child loggers that get created when `WithAttr()`/`WithGroup()` are
called on the underlying handler, which happens via `log.With()` and
`log.WithGroup()` respectively.
The RW mutex was a value in the previous implementation that used
go-kit/log, and I should've updated it to use a pointer when I converted
the deduper.
Also adds a test.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Improves upon #15434, better resolves#15433.
This commit introduces a few changes to ensure safer handling of the
JSONFileLogger:
- the JSONFileLogger struct now implements the slog.Handler interface,
so it can directly be used to create slog Loggers. This pattern more
closely aligns with upstream slog usage (which is generally based around
handlers), as well as making it clear that devs are creating a whole new
logger when interacting with it (vs silently modifying internal configs
like it did previously).
- updates the `promql.QueryLogger` interface to be a union of the
methods of both the `io.Closer` interface and the `slog.Handler`
interface. This allows for plugging in/using slog-compatible loggers
other than the JSONFileLogger, if desired (ie, for downstream projects).
- introduces new `scrape.FailureLogger` interface; just like
`promql.QueryLogger`, it is a union of `io.Closer` and `slog.Handler`
interfaces. Similar logic applies to reasoning.
- updates tests where needed; have the `FakeQueryLogger` from promql's
engine_test implement the `slog.Handler`, improve JSONFileLogger test
suite, etc.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Reduce string manipulation by just cutting off the histogram suffixes from
the series name label once.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Resolves: #15433
When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.
This commit does a few things:
- It was referenced in feedback in #14906 that it would've been better
to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
methods in the contract.
- updates tests accordingly
- cleans up unused methods
Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
PromQL: Correct the behaviour of some operator and aggregators with Native Histograms
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
In general aim for the happy case when the exposer lists the buckets
in ascending order.
Use Compact(2) to compact the result of nhcb convert.
This is more in line with how client_golang optimizes spans vs
buckets.
aef8aedb4b/prometheus/histogram.go (L1485)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
I used these wrapper methods during initial development of the custom
handler that the deduper now implements. Since the deduper implements
slog.Handler and can be used directly as a logger, these wrapper methods
are no longer needed.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
This change should have been included in the initial prometheus slog
conversion, but I must've lost track of it in all the rebases involved
in that PR.
This changes the dedupe logger so that the only method that needs to use
the lock is the `Handle()` method that actually interacts with the
deduplication map.
Ex:
```
==================
WARNING: DATA RACE
Write at 0x00c000518bc0 by goroutine 29481:
github.com/prometheus/prometheus/util/logging.(*Deduper).WithAttrs()
/home/tjhop/go/src/github.com/prometheus/prometheus/util/logging/dedupe.go:89 +0xef
log/slog.(*Logger).With()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/log/slog/logger.go:132 +0x106
github.com/prometheus/prometheus/storage/remote.NewQueueManager()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:483 +0x7a9
github.com/prometheus/prometheus/storage/remote.(*WriteStorage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/write.go:201 +0x102c
github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage.go:92 +0xfd
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.func1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:172 +0x3e4
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.gowrap1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:174 +0x41
Previous read at 0x00c000518bc0 by goroutine 31261:
github.com/prometheus/prometheus/util/logging.(*Deduper).Handle()
/home/tjhop/go/src/github.com/prometheus/prometheus/util/logging/dedupe.go:82 +0x2b1
log/slog.(*Logger).log()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/log/slog/logger.go:257 +0x228
log/slog.(*Logger).Error()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/log/slog/logger.go:230 +0x3d4
github.com/prometheus/prometheus/tsdb/wlog.(*Watcher).loop()
/home/tjhop/go/src/github.com/prometheus/prometheus/tsdb/wlog/watcher.go:254 +0x2db
github.com/prometheus/prometheus/tsdb/wlog.(*Watcher).Start.gowrap1()
/home/tjhop/go/src/github.com/prometheus/prometheus/tsdb/wlog/watcher.go:227 +0x33
Goroutine 29481 (running) created at:
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:164 +0xe4
testing.tRunner()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/testing/testing.go:1690 +0x226
testing.(*T).Run.gowrap1()
/home/tjhop/.asdf/installs/golang/1.23.1/go/src/testing/testing.go:1743 +0x44
Goroutine 31261 (running) created at:
github.com/prometheus/prometheus/tsdb/wlog.(*Watcher).Start()
/home/tjhop/go/src/github.com/prometheus/prometheus/tsdb/wlog/watcher.go:227 +0x177
github.com/prometheus/prometheus/storage/remote.(*QueueManager).Start()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/queue_manager.go:934 +0x304
github.com/prometheus/prometheus/storage/remote.(*WriteStorage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/write.go:232 +0x151b
github.com/prometheus/prometheus/storage/remote.(*Storage).ApplyConfig()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage.go:92 +0xfd
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.func1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:172 +0x3e4
github.com/prometheus/prometheus/storage/remote.TestWriteStorageApplyConfigsDuringCommit.gowrap1()
/home/tjhop/go/src/github.com/prometheus/prometheus/storage/remote/storage_test.go:174 +0x41
==================
--- FAIL: TestWriteStorageApplyConfigsDuringCommit (2.26s)
testing.go:1399: race detected during execution of test
FAIL
FAIL github.com/prometheus/prometheus/storage/remote 68.321s
```
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Fix some edge cases when OOO is enabled
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
promql: corrects binary operators functioning for mixed sample with histogram and float
For invalid pairings of sample types, an annotation is added now.
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
---------
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
* model: move classic to NHCB conversion into its own file
In preparation for #14978.
Author: Jeanette Tan <jeanette.tan@grafana.com> 2024-07-03 11:56:48
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Better naming from review comment
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Add doc strings.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>