The createAttributes error was incorrectly returning nil instead of err,
causing errors to be silently discarded. This could lead to silent data
loss for sum metrics during OTLP ingestion.
Fixes#17953
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* simplify readability of timeseries filtering by using the slices package
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* ensure that BenchmarkBuildTimeSeries doesn't account for the building of
the actual proto in the benchmark results, we only care about the
buildTimeSeries call
Signed-off-by: Callum Styan <callumstyan@gmail.com>
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
tsdb: Early compaction of stale series
Closes#13616
Based on https://github.com/prometheus/proposals/pull/55
Stale series tracking was added in #16925. This PR compacts the stale series into its own block before the normal compaction hits. Here is how the settings:
stale_series_compaction_threshold: As soon as the ratio of stale series in the head block crosses StaleSeriesImmediateCompactionThreshold, TSDB performs a stale series compaction and puts all the stale series into a block and removed it from the head, but it does not remove it from the WAL. (technically this condition is checked every minute and not exactly immediate)
Additional details
WAL replay: after a stale series compaction, tombstones are added with (MinInt64, MaxInt64) for all these stale series. During WAL replay we add a special condition where when we find such tombstone, it immediately removes the series from the memory instead of storing the tombstone. This is required so that we don't spike up memory during WAL replay and also don't keep the compacted stale series in the memory.
Head block truncation ignores this block via the added metadata, similar to out-of-order blocks.
* otlptranslator: filter __name__ from OTLP attributes to prevent duplicates
OTLP metrics can have a __name__ attribute which, when combined with the
metric name passed via extras, creates duplicate __name__ labels.
This commit implements filtering out of any __name__ metric attribute from OTLP.
Also rename TestCreateAttributes to TestPrometheusConverter_createAttributes
for consistency, and add test cases for __name__, __type__, and __unit__ OTLP metric attributes.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* otlptranslator: add label caching for OTLP-to-Prometheus conversion
Add per-request caching to reduce redundant computation and allocations
during OTLP metric conversion:
1. Per-request label sanitization cache: Cache sanitized label names
within a request to avoid repeated string allocations for commonly
repeated labels like __name__, job, instance.
2. Resource-level label caching: Precompute and cache job, instance,
promoted resource attributes, and external labels once per
ResourceMetrics boundary instead of for each datapoint.
3. Scope-level label caching: Precompute and cache scope metadata labels
(otel_scope_name, otel_scope_version, etc.) once per ScopeMetrics
boundary.
4. LabelNamer instance caching: Reuse the LabelNamer struct across
datapoints within the same resource context.
These optimizations significantly reduce allocations and improve latency
for OTLP ingestion workloads with many datapoints per resource/scope.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
The benchmark was passing appendMetadata=false to NewCombinedAppender,
which caused UpdateMetadata to never be called on the underlying
noOpAppender. This resulted in app.metadata always being 0, failing
the assertion that metadata count should be positive.
Fix by enabling metadata appending in the benchmark.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
No implementation yet. Just to test the shape of the interface.
AtST is implemented for trivial cases, anything else is hard coded
to return 0.
Ref: https://github.com/prometheus/prometheus/issues/17791
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
NHCB is native histograms with custom buckets.
prompb is used for both remote write 1.0 and remote read. We do not
support NHCB over remote write 1.0 , however we should absolutely
support it for remote read.
Prometheus remote write 1.0 client already refuses to send NHCB.
Prometheus remote write 1.0 server accepts NHCB, but doesn't store
custom values, corrupting the result. I'm now handling NHCB correctly,
instead of refusing or corrupting.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The original implementation in #9705 for native histograms included a
technical dept #15177 where samples were committed ordered by type
not by their append order. This was fixed in #17071, but this docstring
was not updated.
I've also took the liberty to mention that we do not order by timestamp
either, thus it is possible to append out of order samples.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.
Furthermore, invalid negative offsets might cause problems, too.
Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)
In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.
Signed-off-by: beorn7 <beorn@grafana.com>
* drop extra label from receiver
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* used constant
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>