The last permutation of the translation options does underscore translation but does not add suffixes.
This translation option already exists in Mimir as otel_metric_suffixes_enabled, indicating external demand for this strategy.
There is an accompanying update to prometheus-docs to explain the use of this mode: https://github.com/prometheus/docs/pull/2688
Signed-off-by: Owen Williams <owen.williams@grafana.com>
As `histogram_count` is playing tricks to improve performance, we
better make sure that the limitation of extrapolation below zero still
works as expected.
Signed-off-by: beorn7 <beorn@grafana.com>
Presumably, this will help with Loki alerts, but the added functionality is also generally useful.
For one, this enables `parseDuration` to also accept negative duration (as that's something that is also used in PromQL by now).
This also adds a function `now` to return the evaluation time of the template (as seconds since epoch AKA Unix time) and a function `toDuration` (akin to `toTime`), which creates a Go `time.Duration` from a duration in seconds.
---------
Signed-off-by: Dmitry Ponomaryov <me@halje.ru>
Signed-off-by: Dmitry Ponomaryov <iamhalje@gmail.com>
This RC reverts the feature "OTLP: Support promoting OTel scope attributes".
Add the line back into the CHANGELOG for 3.5.0-rc.0, since we are not changing that version.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This deals with the count field of native histograms in the same way
as with simple float counters. It then scale the whole histogram with
the same factor as it has scaled the count. This will still allow
individual buckets to get extrapolated below zero, but maybe that is
fine.
This implements approach (2) as described in
https://github.com/prometheus/prometheus/issues/15976#issuecomment-3032095158
Signed-off-by: beorn7 <beorn@grafana.com>
Reverts #16730 and #16760
This is being done because we've noticed a problem in the spec that could
lead to name collisions if attributes name, version or schema_url are added
to the scope. They would collide with the already reserved labels
otel_scope_name, otel_scope_version and otel_scope_schema_url.
Since this new configuration option never made it into a release, we can
safely remove it from the 3.5 release. We'll sort this out for the 3.6 release
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
add links to the sources of truth.
It's hard to keep up to date, the "go" one
is "wrong" (not really as an old 1.22 binray could still
download/use newer toolchains...) for example.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
This shows how float counters cannot go below zero when extrapolationg
for rate/increase, and how histograms do not have that protection yet,
leading to an overestimation of the rate/increase.
This also demonstrates edge cases where the count extrapolation does
not need to be limited, but an individual bucket still goes below
zero.
Signed-off-by: beorn7 <beorn@grafana.com>
This commit adds Projection metadata to SelectHints so that downstream
storage implementations can use it to save effort when answering to
Select calls.
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
* OTLP receiver: Generate target_info samples between the earliest and latest samples per resource
Modify the OTLP receiver to generate target_info samples between the earliest
and latest samples per resource instead of only one for the latest timestamp.
The samples are spaced lookback delta/2 apart.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add `ByteSize()` method to different labels implementations.
One of the use case so that we can track the memory used by Labels.
Signed-off-by: Jon Kartago Lamida <me@lamida.net>
If we call ApplyConfig() at the same time the manager is being stopped we might end up hanging forever.
This is because ApplyConfig() will try to cancel obsolete providers and wait until they are cancelled.
It's done by setting a done() function that call Done() on a sync.WaitGroup:
```
if len(prov.newSubs) == 0 {
wg.Add(1)
prov.done = func() {
wg.Done()
}
}
```
then calling prov.cancel() and finally waiting until all providers run done() function
that by blocking it all on a wg.Wait() call.
For each provider there is a goroutine created by calling Manager.startProvider(*Provider):
```
func (m *Manager) startProvider(ctx context.Context, p *Provider) {
m.logger.Debug("Starting provider", "provider", p.name, "subs", fmt.Sprintf("%v", p.subs))
ctx, cancel := context.WithCancel(ctx)
updates := make(chan []*targetgroup.Group)
p.mu.Lock()
p.cancel = cancel
p.mu.Unlock()
go p.d.Run(ctx, updates)
go m.updater(ctx, p, updates)
}
```
It creates a context that can be cancelled and that cancel function becomes prov.cancel. This is what ApplyConfig will call.
If we look at the body of updater() method:
```
func (m *Manager) updater(ctx context.Context, p *Provider, updates chan []*targetgroup.Group) {
// Ensure targets from this provider are cleaned up.
defer m.cleaner(p)
for {
select {
case <-ctx.Done():
return
[...]
```
we can see that it will exit if that context is cancelled and that will trigger a call to Manager.cleaner().
That cleaner() is where done() is called.
So ApplyConfig() -> calls cancel() -> causes cleaner() to be executed -> calls done().
cancel() is also called from cancelDiscoverers() method that will be called by Manager.Run() when Manager is stopping:
```
func (m *Manager) Run() error {
go m.sender()
<-m.ctx.Done()
m.cancelDiscoverers()
return m.ctx.Err()
}
```
The problem is that if we call both ApplyConfig and stop the manager at the same time we might end up with:
- We call Manager.ApplyConfig()
- We stop the Manager
- Manager.cancelDiscoverers() is called
- Provider.cancel() is called for every Provider
- cancel() causes provider context to be cancelled which terminates updater() for given Provider
- cancelling context causes cleaner() method to be called for given Provider
- cleaner() calls done() and exits
- Provider is considered stopped at this point, there is no goroutine running that will call done() anymore
- ApplyConfig iterates providers and decides that one is obsolete is must be stopped
- It sets a custom done() function body with a WaitGroup.Done() call in it
- Then ApplyConfig waits until all Providers run done()
- But they are all stopped and no done() will be run
- We wait forever
This only happens if cancelDiscoverers() is run before ApplyConfig, if ApplyConfig runs first done() will be called,
if cancelDiscoverers() is called first it will stop updater() instances and so done() won't be called anymore.
Part of the problem is that there is no distinction between running and stopped providers. There is Provider.IsStarted() method
that returns a bool based on the value of cancel function but ApplyConfig doesn't check it.
Second problem is that although there is a mutex on a Provider it's used much in the code, so two goroutines can try to read and/or write
provider.cancel and/or provider.done at the same time, making it all more likely to race.
The easiest way to fix it is to check if the provider is started inside ApplyConfig so we don't try to stop a provider that's already stopped.
For that we need to mark it as stopped after cancel() is called, by setting cancel to nil.
This also needs better lock usage to avoid different parts of the code trying to set cancel and done at the same time.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
When doing a config reload that need to stop some providers while also sending SIGTERM to Prometheus at the same time can sometimes hang
1: sync.WaitGroup.Wait [83 minutes] [Created by run.(*Group).Run in goroutine 1 @ group.go:37]
sync sema.go:110 runtime_SemacquireWaitGroup(*uint32(#166))
sync waitgroup.go:118 (*WaitGroup).Wait(*WaitGroup(#23))
discovery manager.go:276 (*Manager).ApplyConfig(#23, #167)
main main.go:964 main.func5(#120)
main main.go:1505 reloadConfig({#183, 0x1b}, 1, #40, #43, #50, {#31, 0xa, 0})
main main.go:1182 main.func22()
run group.go:38 (*Group).Run.func1(*Group(#26), #51)
Add a test for it.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Preserve source files in codemirror-promql package
This allows for sourcemaps to work when the package is imported via ESM-native CDNs such as esm.sh
Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com>
* Preserve source files in lezer-promql package
Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com>
---------
Signed-off-by: wmTJc9IK0Q <171362836+wmTJc9IK0Q@users.noreply.github.com>