* Bump prometheus/common to v0.63.0
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* nolint usage of deprecated model.NameValidationScheme
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Global and Data Source configurations can specify legacy mode, but Prometheus now requires that the overall validation mode be set to UTF-8
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Rationales:
* metadata-wal-records might be deprecated and replaced going forward: https://github.com/prometheus/prometheus/issues/15911
* PRW 2.0 works without metadata just fine (although it sends untyped metrics as expected).
Signed-off-by: bwplotka <bwplotka@gmail.com>
The was a bug (due to confusion?) on the local metadata cache that is cached
by metric family not the series metric name. The fix is to NOT use that local cache
at all (it's still needed for current metadata API implementation, added TODO
on how we can get rid of it).
I went ahead and also rename Metric field in metadata structs to MetricFamily to make
clear it's not always __name__.
Signed-off-by: bwplotka <bwplotka@gmail.com>
What
Adds support for OTLP delta temporality to the OTLP endpoint.
This is done by calling the deltatocumulative processor from the OpenTelemetry collector during OTLP conversion.
Why
Delta conversion is a naturally stateful process, which requires careful request routing when operated inside a collector.
Prometheus is already stateful and doing the conversion in-server reduces the operational burden on the ingest architecture by only having one stateful component.
How
deltatocumulative is a OTel collector component that works as follows:
* pmetric.Metrics come from a receiver or in this case from the HTTP client
* It operates as an in-place update loop:
* for each sample, if not delta, leave unmodified
* if delta, do:
* state += sample, where state is the in-memory sum of all previous samples
* sample = state, sample value is now cumulative
* this is supported for sums (counters), gauges, histograms (old histograms) and exponential histograms (native histograms)
If a series receives no new samples for 5m, its state is removed from memory
Performance
Delta performance is a stateful operation and the OTel code is not highly optimized yet, e.g. it locks the entire processor for each request. Nonetheless, care has been taken to mitigate those effects:
delta conversion is behind a feature flag. If disabled, no conversion code is ever invoked
if enabled, conversion is not invoked if request not actually contains delta samples. This leads to no measureable performance difference between default-cumulative to convert-cumulative (only cumulative, feature on/off)
Signed-off-by: sh0rez <me@shorez.de>
Fix issues raised by staticcheck
We are not enabling staticcheck explicitly, though, because it has too many false positives.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
This commit introduced two field in `/status` endpoint:
- The node currently serving the request.
- The current server time for debugging time drift issues.
fixes#15394.
Signed-off-by: sujal shah <sujalshah28092004@gmail.com>
Instead of storing discovered labels on every target, recompute them if
required. The `Target` struct now needs to hold some more data required
to recompute them, such as ScrapeConfig.
This moves the load from every Prometheus all of the time, to just when
someone views Service Discovery in the UI.
The way `PopulateLabels` is used changes; you are no longer expected to
call it with a part-populated `labels.Builder`.
The signature of `Target.Labels` changes to take a `labels.Builder`
instead of a `ScratchBuilder`, for consistency with `DiscoveredLabels`.
This will save a lot of work when many targets are filtered out in
relabeling. Combine with `keep_dropped_targets` to avoid ever computing
most labels for dropped targets.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Previously, the api was evaluating this regex to determine if the label
name was valid or not:
14bac55a99/model/labels.go (L94)
However, I believe that the `IsValid()` function is what ought to be
used in the brave new utf8 era.
**Before**
```
$ curl localhost:9090/api/v1/label/host.name/values
{"status":"error","errorType":"bad_data","error":"invalid label name: \"host.name\""}
```
**After**
```
$ curl localhost:9090/api/v1/label/host.name/values
{"status":"success","data":["localhost"]}
```
It's very likely that I'm missing something here or you were already
planning to do this at some point but I just encountered this issue and
figured I'd give it a go.
Signed-off-by: Owen Williams <owen.williams@grafana.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
api: Improve doc comments for v1.MinTime and v1.MaxTime
While investigated something mostly unrelated, I got nerd-sniped by
the calculation of v1.MinTime and v1.MaxTime. The seemingly magic
number in there (62135596801) needed an explanation. While looking for
it, I found out that the offsets used here are actually needlessly
conservative. Since the timestamps are so far in the past or future,
respectively, that there is no practical impact, except that the
calculation is needlessly obfuscated. However, we won't change the
values now to not cause any confusion for users of this code. Still, I
think the doc comment should explain the circumstances so nobody gets
nerd-sniped again as I did today.
For the record: 62135596800 is the difference in seconds between 0001-01-01 00:00:00 (Unix time zero point) and 1971-01-01 00:00:00 (Go time zero point) in the Gregorian calendar. If "Prometheus time" were in seconds (not milliseconds), that difference would be relevant to prevent over-/underflow when converting from "Prometheus time" to "Go time".
Signed-off-by: beorn7 <beorn@grafana.com>
---------
Signed-off-by: beorn7 <beorn@grafana.com>
The OTLP receiver can now considered stable. We've had it for longer
than a year in main and has received constant improvements.
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
This switches from the prehistoric EventSource API to the more modern
fetch-event-source package. That packages gives us full control over the
retries.
It also gives us the opportunity to close the event source when the
browser tab is hidden, saving resources.
Signed-off-by: Julien <roidelapluie@o11y.eu>
This commit introduces a new `/api/v1/notifications/live` endpoint that
utilizes Server-Sent Events (SSE) to stream notifications to the web UI.
This is used to display alerts such as when a configuration reload
has failed.
I opted for SSE over WebSockets because SSE is simpler to implement and
more robust for our use case. Since we only need one-way communication
from the server to the client, SSE fits perfectly without the overhead
of establishing and maintaining a two-way WebSocket connection.
When the SSE connection fails, we go back to a classic
/api/v1/notifications API endpoint.
This commit also contains the required UI changes for the new Mantine UI.
Signed-off-by: Julien <roidelapluie@o11y.eu>