* promtool: allow cardinality with metrics linting and add --lint to check metrics
Signed-off-by: ADITYA TIWARI <adityatiwari342005@gmail.com>
* fix/ci: Simplify test case variable declaration
Remove unnecessary variable declaration in test cases.
Signed-off-by: ADITYA TIWARI <142050150+ADITYATIWARI342005@users.noreply.github.com>
* promtool: avoid Tee for --lint=none
Signed-off-by: ADITYA TIWARI <adityatiwari342005@gmail.com>
* promtool: validate at least one feature enabled in check metrics
addresses feedback to ensure the command does something useful
now fails with clear error when both --lint=none and no --extended flag.
Signed-off-by: ADITYA TIWARI <adityatiwari342005@gmail.com>
---------
Signed-off-by: ADITYA TIWARI <adityatiwari342005@gmail.com>
Signed-off-by: ADITYA TIWARI <142050150+ADITYATIWARI342005@users.noreply.github.com>
* Delay compactions until Thanos uploads all blocks
Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.
The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Rename structs and functions to make this more generic
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Address review comments
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Cache UploadMeta for 1 minute
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
---------
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* add feature flag for remote write v2
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* change from number to protobuf_message
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix name
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* run make cli-documentation
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix help
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* run make cli-documentation
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* Add anchored and smoothed to vector selectors.
This adds "anchored" and "smoothed" keywords that can be used following a matrix selector.
"Anchored" selects the last point before the range (or the first one after the range) and adds it at the boundary of the matrix selector.
"Smoothed" applies linear interpolation at the edges using the points around the edges. In the absence of a point before or after the edge, the first or the last point is added to the edge, without interpolation.
*Exemple usage*
* `increase(caddy_http_requests_total[5m] anchored)` (equivalent of *caddy_http_requests_total - caddy_http_requests_total offset 5m* but takes counter reset into consideration)
* `rate(caddy_http_requests_total[step()] smoothed)`
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Update docs/feature_flags.md
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
* Smoothed/Anchored rate: Add more tests
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* Anchored/Smoothed modifier: error out with histograms
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
---------
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
Signed-off-by: Julien <291750+roidelapluie@users.noreply.github.com>
Co-authored-by: Charles Korn <charleskorn@users.noreply.github.com>
State explicitely what kind of timestamps are expected for the
--min-time and --max-time options of promtool tsdb commands.
This is especially important for the dump-openmetrics command as users
could otherwise mistakenly think it would be in seconds, like the
OpenMetrics timestamps themselves.
Signed-off-by: Nicolas Peugnet <nicolas.peugnet@lip6.fr>
These are supported in the main prometheus binary but the feature flags
weren't supported in promtool.
Fixes#16412.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Updated the parser to allow calculations in PromQL durations.
This enables durations in the form of:
rate(http_requests_total[10m+2s])
The computation of the calculations is done directly at the parse level and does not hit the PromQL Engine.
The lexer has also been updated and improved, in particular for subqueries.
Buxfix: rate(http_requests_total[0]) is no longer allowed.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* ruler notifier: make batch size configurable
In Mimir we experimented with setting a higher value for the batch size.
A 4x increase in batch size decreased the time to process a single notification by about 2x.
This reduces the processing time of the notifications queue and increases the throughput of the queue.
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Update cmd/prometheus/main.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Update docs
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Use a string constant
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Add godoc comment on exported constant
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
---------
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Add --ignore-unknown-fields that ignores unknown fields in rule group
files. There are lots of tools in the ecosystem that "like" to extend
the rule group file structure but they are currently unreadable by
promtool if there's anything extra. The purpose of this flag is so that
we could use the "vanilla" promtool instead of rolling our own.
Some examples of tools/code:
https://github.com/grafana/mimir/blob/main/pkg/mimirtool/rules/rwrulefmt/rulefmt.go8898eb3cc5/pkg/rules/rules.go (L18-L25)
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
What
Adds support for OTLP delta temporality to the OTLP endpoint.
This is done by calling the deltatocumulative processor from the OpenTelemetry collector during OTLP conversion.
Why
Delta conversion is a naturally stateful process, which requires careful request routing when operated inside a collector.
Prometheus is already stateful and doing the conversion in-server reduces the operational burden on the ingest architecture by only having one stateful component.
How
deltatocumulative is a OTel collector component that works as follows:
* pmetric.Metrics come from a receiver or in this case from the HTTP client
* It operates as an in-place update loop:
* for each sample, if not delta, leave unmodified
* if delta, do:
* state += sample, where state is the in-memory sum of all previous samples
* sample = state, sample value is now cumulative
* this is supported for sums (counters), gauges, histograms (old histograms) and exponential histograms (native histograms)
If a series receives no new samples for 5m, its state is removed from memory
Performance
Delta performance is a stateful operation and the OTel code is not highly optimized yet, e.g. it locks the entire processor for each request. Nonetheless, care has been taken to mitigate those effects:
delta conversion is behind a feature flag. If disabled, no conversion code is ever invoked
if enabled, conversion is not invoked if request not actually contains delta samples. This leads to no measureable performance difference between default-cumulative to convert-cumulative (only cumulative, feature on/off)
Signed-off-by: sh0rez <me@shorez.de>
Enable the `auto-gomaxprocs` feature flag by default.
* Add command line flag `--no-auto-gomaxprocs` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
Enable the `auto-gomemlimit` feature flag by default.
* Add command line flag `--no-auto-gomemlimit` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
* promtool: Add debug flag for rule tests
This makes it print out the tsdb state (both input_series and rules that
are run) at the end of a test, making reasoning about tests much easier.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
* Reuse generated test name from junit testing
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: David Leadbeater <dgl@dgl.cx>
The OTLP receiver can now considered stable. We've had it for longer
than a year in main and has received constant improvements.
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Conflicts:
cmd/prometheus/main.go
docs/command-line/prometheus.md
docs/feature_flags.md
web/ui/build_ui.sh
web/web.go
Resolved by dropping the UTF-8 feature flag and adding the
`auto-reload-config` feature flag.
For the new web ui pick all changes from `main`.
Fix call to newTestEngine(t) in promql/engine_test.go:3214.
`agent` feature-flag it's own cmdline flag now.
Remove `scrape.name-escaping-scheme` argument.
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation
- This change allows optionally preserving the `__name__` label via the `label_replace` and `label_join` functions, and helps prevent the dreaded "vector cannot contain metrics with the same labelset" error.
- The implementation extends the `Series` and `Sample` structs with a boolean flag indicating whether the `__name__` label should be deleted at the end of the query evaluation.
- The `label_replace` and `label_join` functions can still access the value of the `__name__` label, even if it has been previously marked for deletion. If `__name__` is used as target label, it won't be dropped at the end of the query evaluation.
- Fixes https://github.com/prometheus/prometheus/issues/11397
- See https://github.com/jcreixell/prometheus/pull/2 for previous discussion, including the decision to create this PR and benchmark it before considering other alternatives (like refactoring `labels.Labels`).
- See https://github.com/jcreixell/prometheus/pull/1 for an alternative implementation using a special label instead of boolean flags.
- Note: a feature flag `promql-delayed-name-removal` has been added as it changes the behavior of some "weird" queries (see https://github.com/prometheus/prometheus/issues/11397#issuecomment-1451998792)
Example (this always fails, as `__name__` is being dropped by `count_over_time`):
```
count_over_time({__name__!=""}[1m])
=> Error executing query: vector cannot contain metrics with the same labelset
```
Before:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=> Error executing query: vector cannot contain metrics with the same labelset
```
After:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=>
count_go_gc_cycles_automatic_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
count_go_gc_cycles_forced_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
...
```
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
---------
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
avoid simultaneous compactions and reduce stress on shared resources.
This is enabled via `--enable-feature=delayed-compaction`.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>