* OTLP receiver: Don't append _total suffix to non-monotonic OTel sums
Fix the OTLP receiver so the suffix _total isn't appended to metrics
converted from non-monotonic OTel sum metrics, if otlp.translation_strategy is
UnderscoreEscapingWithSuffixes or NoUTF8EscapingWithSuffixes.
Also add translation tests.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
- Wrapped existing test logic in a loop to run with both protocol versions
- Ensures consistent behavior across protocol versions for dropping old time series
Signed-off-by: AxcelXander <tyz666@bu.edu>
Co-authored-by: AxcelXander <tyz666@bu.edu>
* Update otlptranslator with new API
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Add support for promoting all OTel resource attributes via `promote_all_resource_attributes`,
except for those ignored using 'ignore_resource_attributes'.
---------
Signed-off-by: Antonio Jimenez <antonjim@thousandEyes.com>
Signed-off-by: Antonio Jimenez <123171955+antonjim-te@users.noreply.github.com>
* feat: Support 'NoTranslation' mode in OTLP endpoint
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Ref: https://github.com/prometheus/prometheus/issues/15021
Also modified spansToSpansProto to not allocate empty bucket spans array
when converting internal model to remote write model.
Otherwise the test TestDecodeWriteV2Request fails since empty array
is marshaled/unmarshaled as nil so we don't get back the exact same
thing.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Fix storage/remote.pool interned refs count and flaky test
I saw TestIntern_MultiRef_Concurrent failing on a different PR saying 'expected refs to be 1 but it was 2'.
I took a look, and it definitely can be racy, especially with a time.Sleep() of just 1ms.
I'm fixing that by explicitly waiting until it has been released, and by repeating that 1000 times, otherwise it's just a recipe for a future flaky test.
OTOH, I also took a look at the implementation and saw that we were not holding the RLock() when increasing the references count, so when releasing there was a race condition for the cleanup, I fixed that by holding RLock() while increasing the references count.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* s/Equalf/Equal/
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
---------
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
* Remove experimental out-of-order native histogram flag
This feature has been available in Prometheus since September 2024,
and has no known issues. Therefore proposing to remove the flag
entirely and always have it on. Note that there are still two
settings that need to be configured (out-of-order time window > 0
and native histograms enabled) for this feature to work.
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update CHANGELOG
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Keep feature flag with warning
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update CHANGELOG
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
* Update tsdb/head_append.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Update CHANGELOG.md
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Update tsdb/head_append.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
* Additional cleanup of comments and test names
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
---------
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Global and Data Source configurations can specify legacy mode, but Prometheus now requires that the overall validation mode be set to UTF-8
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Rationales:
* metadata-wal-records might be deprecated and replaced going forward: https://github.com/prometheus/prometheus/issues/15911
* PRW 2.0 works without metadata just fine (although it sends untyped metrics as expected).
Signed-off-by: bwplotka <bwplotka@gmail.com>
The was a bug (due to confusion?) on the local metadata cache that is cached
by metric family not the series metric name. The fix is to NOT use that local cache
at all (it's still needed for current metadata API implementation, added TODO
on how we can get rid of it).
I went ahead and also rename Metric field in metadata structs to MetricFamily to make
clear it's not always __name__.
Signed-off-by: bwplotka <bwplotka@gmail.com>
Found during testing for
https://github.com/grafana/mimir/issues/9072
Debug printout showed:
KRAJO: seriesName=cortex_request_duration_seconds_bucket,
metricFamily=cortex_request_duration_seconds_bucket,
type=GAUGE,
help=cortex_bucket_index_load_duration_seconds_sum,
unit=
which is nonsense.
I can imagine more cases where this is the case and makes actual sense.
Some targets might miss metadata and if there's a pipeline that loses it.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
What
Adds support for OTLP delta temporality to the OTLP endpoint.
This is done by calling the deltatocumulative processor from the OpenTelemetry collector during OTLP conversion.
Why
Delta conversion is a naturally stateful process, which requires careful request routing when operated inside a collector.
Prometheus is already stateful and doing the conversion in-server reduces the operational burden on the ingest architecture by only having one stateful component.
How
deltatocumulative is a OTel collector component that works as follows:
* pmetric.Metrics come from a receiver or in this case from the HTTP client
* It operates as an in-place update loop:
* for each sample, if not delta, leave unmodified
* if delta, do:
* state += sample, where state is the in-memory sum of all previous samples
* sample = state, sample value is now cumulative
* this is supported for sums (counters), gauges, histograms (old histograms) and exponential histograms (native histograms)
If a series receives no new samples for 5m, its state is removed from memory
Performance
Delta performance is a stateful operation and the OTel code is not highly optimized yet, e.g. it locks the entire processor for each request. Nonetheless, care has been taken to mitigate those effects:
delta conversion is behind a feature flag. If disabled, no conversion code is ever invoked
if enabled, conversion is not invoked if request not actually contains delta samples. This leads to no measureable performance difference between default-cumulative to convert-cumulative (only cumulative, feature on/off)
Signed-off-by: sh0rez <me@shorez.de>
Fix issues raised by staticcheck
We are not enabling staticcheck explicitly, though, because it has too many false positives.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
BuildCompliantName was renamed to BuildCompliantMetricName, and it no longer takes UTF8 support into consideration. It focuses on building a metric name that follows Prometheus conventions.
A new function, BuildMetricName, was added to optionally add unit and type suffixes to OTLP metric names without translating any characters to underscores(_).