209 Commits

Author SHA1 Message Date
Jeanette Tan
796b1bbfde Merge branch 'main' into nhcb
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-05-08 19:11:39 +08:00
Arthur Silva Sens
7aacef9b42
bugfix: Decouple native histogram ingestions and protobuf parsing
Up until this point, if a scrape was done with the protobuf format Prometheus would always try to ingest native histograms even with the feature flag disabled. This causes problems with other feature-flags that depend on the protobuf format, like 'created-timestamp-zero-ingestion'. This commit decouples native histogram parsing from ingestion, making sure ingestion only happens when the 'native-histogram' feature-flag is enabled.

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
2024-04-24 17:02:52 -03:00
György Krajcsovits
bcafa5f1f9 Merge remote-tracking branch 'upstream/main' into update-nhcb 2024-04-24 11:06:59 +02:00
Matthieu MOREL
d496687c8e golangci-lint: enable usestdlibvars linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-08 19:26:23 +00:00
György Krajcsovits
2a4aa085d2 Merge branch 'main' into nhcb 2024-03-27 18:42:10 +01:00
Ziqi Zhao
64dfd8a158
fix the bug of setting native histogram min bucket factor (#13846)
* fix the bug of setting native histogram min bucket factor

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>

* Add unit test for checking that min_bucket_factor is correctly applied

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-03-27 16:32:37 +01:00
György Krajcsovits
a3d1a46eda Merge branch 'main' into nhcb 2024-03-22 14:51:48 +01:00
Bryan Boreham
5ed21c0d76
Merge pull request #12933 from prymitive/duplicated_samples
When Prometheus scrapes a target and it sees the same time series repeated multiple times it currently silently ignores that. This change adds a test for that and fixes the scrape loop so that:

* Only first sample for each unique time series is appended
* Duplicated samples increment the prometheus_target_scrapes_sample_duplicate_timestamp_total metric
This allows one to identify such scrape jobs and targets.

Also fix some tests and benchmark.
2024-03-16 09:18:46 +00:00
Bryan Boreham
6c41ec984f [BUGFIX] Scraping: Tolerance should be max 1% of interval
Previous code set it at minimum 1%, which was not intended.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-08 10:18:18 +00:00
Julien
fcdb6c2b14
Merge pull request #13624 from roidelapluie/tolerance
Always align scrapes even if tolerance is bigger than 1% of scrape interval
2024-03-04 16:37:00 +01:00
machine424
f477e0539a
Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21
Prevent adding back golang.org/x/exp/slices.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 14:54:53 +01:00
György Krajcsovits
5d0a0a7542 Add custom buckets to native histogram model (#13592)
* add custom buckets to native histogram model
* simple copy for custom bounds
* return errors for unsupported add/sub operations
* add test cases for string and update appendhistogram in scrape to account for new schema
* check fields which are supposed to be unused but may affect results in equals
* allow appending custom buckets histograms regardless of max schema

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-02-28 14:06:43 +01:00
Łukasz Mierzwa
c013a3c1b5 Check of duplicated samples directly in scrapeCache.get()
Avoid extra map lookup by hooking check into cache get.

```
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/scrape
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
                     │  main.txt   │              new.txt               │
                     │   sec/op    │   sec/op     vs base               │
ScrapeLoopAppend-8     66.72µ ± 0%   66.89µ ± 0%       ~ (p=0.879 n=50)
ScrapeLoopAppendOM-8   66.61µ ± 0%   66.89µ ± 1%       ~ (p=0.115 n=50)
geomean                66.66µ        66.89µ       +0.34%

                     │   main.txt    │               new.txt                │
                     │     B/op      │     B/op      vs base                │
ScrapeLoopAppend-8     20.17Ki ±  1%   20.12Ki ± 1%        ~ (p=0.343 n=50)
ScrapeLoopAppendOM-8   20.38Ki ± 10%   17.99Ki ± 2%  -11.69% (p=0.017 n=50)
geomean                20.27Ki         19.03Ki        -6.14%

                     │  main.txt  │               new.txt               │
                     │ allocs/op  │ allocs/op   vs base                 │
ScrapeLoopAppend-8     11.00 ± 0%   11.00 ± 0%       ~ (p=1.000 n=50) ¹
ScrapeLoopAppendOM-8   12.00 ± 0%   12.00 ± 0%       ~ (p=1.000 n=50) ¹
geomean                11.49        11.49       +0.00%
¹ all samples are equal
```

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2024-02-27 12:09:37 +00:00
Łukasz Mierzwa
21f8b35f5b Move staleness tracking out of checkAddError() calls
This call bloats checkAddError signature and logic, we can and should call it from the main scrape logic.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2024-02-27 11:36:16 +00:00
Łukasz Mierzwa
50c81bed86 Check for duplicated series on a scrape
When Prometheus scrapes a target and it sees the same time series repeated multiple times it currently silently ignores that.
This change adds a test for that and fixes the scrape loop so that:

- Only first sample for each unique time series is appended
- Duplicated samples increment the prometheus_target_scrapes_sample_duplicate_timestamp_total metric

This allows one to identify such scrape jobs and targets.

Benchmark results:

```
name                            old time/op    new time/op    delta
ScrapeLoopAppend-8                64.8µs ± 2%    71.1µs ±20%   +9.75%  (p=0.000 n=10+10)
ScrapeLoopAppendOM-8              64.2µs ± 1%    68.5µs ± 7%   +6.71%  (p=0.000 n=9+10)
TargetsFromGroup/1_targets-8      14.2µs ± 1%    14.5µs ± 1%   +1.99%  (p=0.000 n=10+10)
TargetsFromGroup/10_targets-8      149µs ± 1%     152µs ± 1%   +2.05%  (p=0.000 n=9+10)
TargetsFromGroup/100_targets-8    1.49ms ± 4%    1.48ms ± 1%     ~     (p=0.796 n=10+10)

name                            old alloc/op   new alloc/op   delta
ScrapeLoopAppend-8                19.9kB ± 1%    17.8kB ± 3%  -10.23%  (p=0.000 n=8+10)
ScrapeLoopAppendOM-8              19.9kB ± 1%    18.3kB ±10%   -8.14%  (p=0.001 n=9+10)
TargetsFromGroup/1_targets-8      2.43kB ± 0%    2.43kB ± 0%   -0.15%  (p=0.045 n=10+10)
TargetsFromGroup/10_targets-8     24.3kB ± 0%    24.3kB ± 0%     ~     (p=0.083 n=10+9)
TargetsFromGroup/100_targets-8     243kB ± 0%     243kB ± 0%     ~     (p=0.720 n=9+10)

name                            old allocs/op  new allocs/op  delta
ScrapeLoopAppend-8                  9.00 ± 0%      9.00 ± 0%     ~     (all equal)
ScrapeLoopAppendOM-8                10.0 ± 0%      10.0 ± 0%     ~     (all equal)
TargetsFromGroup/1_targets-8        40.0 ± 0%      40.0 ± 0%     ~     (all equal)
TargetsFromGroup/10_targets-8        400 ± 0%       400 ± 0%     ~     (all equal)
TargetsFromGroup/100_targets-8     4.00k ± 0%     4.00k ± 0%     ~     (all equal)
```

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2024-02-27 11:36:16 +00:00
Bryan Boreham
5f50d974c9 scraping: reset symbol table periodically
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Bryan Boreham
4e748b9cd8 scraping: re-use labels Builder in scrape report metrics
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Bryan Boreham
abb3a62f04 scraping: re-use symbol table for scrape loops
One symbol table for all loops in the same scrape pool, i.e. from the
same job.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Bryan Boreham
0403d098e1 scraping: re-use symbolTable for target discovery
Call labels.NewBuilderWithSymbolTable.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Łukasz Mierzwa
5597020a60 Use github.com/klauspost/compress for gzip and zlib
klauspost/compress is a high quality drop-in replacement for common Go
compression libraries. Since Prometheus sends out a lot of HTTP requests
that often return compressed output having improved compression
libraries helps to save cpu & memory resources.
On a test Prometheus server I was able to see cpu reduction from 31 to
30 cores.

Benchmark results:

name                                old time/op    new time/op    delta
TargetScraperGzip/metrics=1-8         69.4µs ± 4%    69.2µs ± 3%     ~     (p=0.122 n=50+50)
TargetScraperGzip/metrics=100-8       84.3µs ± 2%    80.9µs ± 2%   -4.02%  (p=0.000 n=48+46)
TargetScraperGzip/metrics=1000-8       296µs ± 1%     274µs ±14%   -7.35%  (p=0.000 n=47+45)
TargetScraperGzip/metrics=10000-8     2.06ms ± 1%    1.66ms ± 2%  -19.34%  (p=0.000 n=47+45)
TargetScraperGzip/metrics=100000-8    20.9ms ± 2%    17.5ms ± 3%  -16.50%  (p=0.000 n=49+50)

name                                old alloc/op   new alloc/op   delta
TargetScraperGzip/metrics=1-8         6.06kB ± 0%    6.07kB ± 0%   +0.24%  (p=0.000 n=48+48)
TargetScraperGzip/metrics=100-8       7.04kB ± 0%    6.89kB ± 0%   -2.17%  (p=0.000 n=49+50)
TargetScraperGzip/metrics=1000-8      9.02kB ± 0%    8.35kB ± 1%   -7.49%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=10000-8     18.1kB ± 1%    16.1kB ± 2%  -10.87%  (p=0.000 n=47+47)
TargetScraperGzip/metrics=100000-8    1.21MB ± 0%    1.01MB ± 2%  -16.69%  (p=0.000 n=36+50)

name                                old allocs/op  new allocs/op  delta
TargetScraperGzip/metrics=1-8           71.0 ± 0%      72.0 ± 0%   +1.41%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=100-8         81.0 ± 0%      76.0 ± 0%   -6.17%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=1000-8        92.0 ± 0%      83.0 ± 0%   -9.78%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=10000-8       93.0 ± 0%      91.0 ± 0%   -2.15%  (p=0.000 n=50+50)
TargetScraperGzip/metrics=100000-8       111 ± 0%       135 ± 1%  +21.89%  (p=0.000 n=40+50)

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2024-02-22 17:08:15 +00:00
Julien Pivotto
e22b564049 Always align scrapes even if tolerance is bigger than 1% of scrape interval
When the scrape tolerance is bigger than 1% of the scrape interval, take
1% of the scrape interval as the tolerance instead of not aligning the
scrape at all.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2024-02-21 15:09:21 +01:00
Ziqi Zhao
df2a0ecf3b
Native Histograms: support native_histogram_min_bucket_factor in scrape_config (#13222)
Native Histograms: support native_histogram_min_bucket_factor in scrape_config

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2024-01-17 16:58:54 +01:00
Julien Pivotto
0763ec841b
Merge pull request #13313 from kalpadiptyaroy/fix-quality-value-accept-header
bug: Fix quality value in accept header
2023-12-21 11:40:30 +01:00
Kumar Kalpadiptya Roy
b012366c33 Issue #13268: fix quality value in accept header
Signed-off-by: Kumar Kalpadiptya Roy <kalpadiptya.roy@outlook.com>
2023-12-21 10:33:05 +05:30
Bryan Boreham
c83e1fc574 textparse: remove MetricType alias
No backwards-compatibility; make a clean break.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-12-19 18:56:54 +00:00
Bryan Boreham
8065bef172 Move metric type definitions to common/model
They are used in multiple repos, so common is a better place for them.
Several packages now don't depend on `model/textparse`, e.g.
`storage/remote`.

Also remove `metadata` struct from `api.go`, since it was identical to
a struct in the `metadata` package.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-12-19 18:56:54 +00:00
Arthur Silva Sens
5082655392
Append Created Timestamps (#12733)
* Append created timestamps.

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Log when created timestamps are ignored

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Proposed changes to Append CT PR.

Changes:

* Changed textparse Parser interface for consistency and robustness.
* Changed CT interface to be more explicit and handle validation.
* Simplified test, change scrapeManager to allow testability.
* Added TODOs.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Updates.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Refactor head_appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Fix linter issues

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

* Use model.Sample in head appender test

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>

---------

Signed-off-by: Arthur Silva Sens <arthur.sens@coralogix.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
2023-12-11 08:43:42 +00:00
Julien Pivotto
965e603fa7
Merge pull request #13184 from bboreham/exemplar-sort
Scraping: use slices.sort for exemplars
2023-11-25 09:34:48 +01:00
Bryan Boreham
f0e1b592ab Scraping: use slices.sort for exemplars
The sort implementation using Go generics is used everywhere else
in Prometheus.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-11-24 14:42:26 +00:00
Bryan Boreham
9051100aba Scraping: share buffer pool across all scrapes
Previously we had one per scrapePool, and one of those per configured
scraping job. Each pool holds a few unused buffers, so sharing one
across all scrapePools reduces total heap memory.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-11-23 10:23:34 +00:00
Łukasz Mierzwa
870627fbed Add enable_compression scrape config option
Currently Prometheus will always request gzip compression from the target when sending scrape requests.
HTTP compression does reduce the amount of bytes sent over the wire and so is often desirable.
The downside of compression is that it requires extra resources - cpu & memory.

This also affects the resource usage on the target since it has to compress the response
before sending it to Prometheus.

This change adds a new option to the scrape job configuration block: enable_compression.
The default is true so it remains the same as current Prometheus behaviour.

Setting this option to false allows users to disable compression between Prometheus
and the scraped target, which will require more bandwidth but it lowers the resource
usage of both Prometheus and the target.

Fixes #12319.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2023-11-20 12:02:55 +00:00
zenador
32ee1b15de
Fix error on ingesting out-of-order exemplars (#13021)
Fix and improve ingesting exemplars for native histograms.

See code comment for a detailed explanation of the algorithm.

Note that this changes the current behavior for all kind of samples slightly: We now allow exemplars with the same timestamp as during the last scrape if the value or the labels have changed.

Also note that we now do not ingest exemplars without timestamps for native histograms anymore.

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>

---------

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: zenador <zenador@users.noreply.github.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2023-11-16 15:07:37 +01:00
Matthieu MOREL
7eaefcf379
ci(lint): enable errorlint on scrape (#12923)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
2023-11-01 20:06:46 +01:00
Björn Rabenstein
a43669e611
Merge pull request #12928 from alexandear/ci-enable-godot
ci(lint): enable godot; append dot at the end of comments
2023-11-01 17:15:41 +01:00
Julien Pivotto
84aadfc45b scrape: Added trackTimestampsStaleness configuration option
Add the ability to track staleness when an explicit timestamp is set.
Useful for cAdvisor.

Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
2023-10-31 16:58:42 -04:00
Oleksandr Redko
fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Paulin Todev
5752050b42
Scrape metrics can now be registered with a non-default registry.
* A registerer is passed to the scrape Manager,
and all scrape metrics register with it.
* For now the registry which we pass to the scrape
Manager is still the global one.

Signed-off-by: Paulin Todev <paulin.todev@gmail.com>
2023-10-11 16:19:00 +01:00
Bartlomiej Plotka
624b973ebf
Added ability to specify scrape protocols to accept during HTTP content type negotiation. (#12738)
* Added ability to specify scrape protocols to accept during HTTP content type negotiation.


This is done via new option in GlobalConfig and ScrapeConfig: "scrape_protocol"

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Fixed readability and log message.

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
2023-10-10 11:16:55 +01:00
Bryan Boreham
f6d9c84fde
scraping: delay creating buffer, to save memory (#12953)
We don't need the buffer to read the response until the scrape http call
returns; creating it earlier makes the buffer pool larger.

I split `scrape()` into `scrape()` which returns with the http response,
and `readResponse()` which decompresses and copies the data into the
supplied buffer. This design was chosen to minimize impact on the logic.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-09 17:23:53 +01:00
Bryan Boreham
7c934ae18c scraping: hoist labels variable to save garbage
`lset` escapes to heap due to being passed through the text-parser
interface, so we can reduce garbage by hoisting it out of the loop so
only one allocation is done for every series in a scrape.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-10-05 11:04:59 +00:00
Goutham Veeramachaneni
86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Bryan Boreham
611f50bb3d scrape: retain all dropped targets when KeepDroppedTargets is zero
This was a bug.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-20 14:32:23 +01:00
Bryan Boreham
1e3fef6ab0
scraping: limit detail on dropped targets, to save memory (#12647)
It's possible (quite common on Kubernetes) to have a service discovery
return thousands of targets then drop most of them in relabel rules.
The main place this data is used is to display in the web UI, where
you don't want thousands of lines of display.

The new limit is `keep_dropped_targets`, which defaults to 0
for backwards-compatibility.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-08-14 15:39:25 +01:00
beorn7
0e3f35324b scrape: Enable ingestion of multiple exemplars per sample
This has become a requirement for native histograms, as a single
histogram sample commonly has many buckets, so that providing many
exemplars makes sense.

Since OM text doesn't support native histograms yet, the test had to
be expanded to also support protobuf test cases.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-07-13 14:16:10 +02:00
Bryan Boreham
5255bf06ad Replace sort.Slice with faster slices.SortFunc
The generic version is more efficient.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2023-07-02 22:17:08 +00:00
Julius Volz
cb045c0e4b Fix wording from "jitterSeed" -> "offsetSeed" for server-wide scrape offsets
In digital communication, "jitter" usually refers to how much a signal deviates
from true periodicity, see https://en.wikipedia.org/wiki/Jitter. The way we are
using the "jitterSeed" in Prometheus does not affect the true periodicity at
all, but just introduces a constant phase shift (or offset) within the period.
So it would be more correct and less confusing to call the "jitterSeed" an
"offsetSeed" instead.

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2023-05-25 11:54:00 +02:00
beorn7
9e500345f3 textparse/scrape: Add option to scrape both classic and native histograms
So far, if a target exposes a histogram with both classic and native
buckets, a native-histogram enabled Prometheus would ignore the
classic buckets. With the new scrape config option
`scrape_classic_histograms` set, both buckets will be ingested,
creating all the series of a classic histogram in parallel to the
native histogram series. For example, a histogram `foo` would create a
native histogram series `foo` and classic series called `foo_sum`,
`foo_count`, and `foo_bucket`.

This feature can be used in a migration strategy from classic to
native histograms, where it is desired to have a transition period
during which both native and classic histograms are present.

Note that two bugs in classic histogram parsing were found and fixed
as a byproduct of testing the new feature:

1. Series created from classic _gauge_ histograms didn't get the
   _sum/_count/_bucket prefix set.
2. Values of classic _float_ histograms weren't parsed properly.

Signed-off-by: beorn7 <beorn@grafana.com>
2023-05-13 01:32:25 +02:00
Jeanette Tan
40240c9c1c Update according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-05-05 02:33:00 +08:00
Jeanette Tan
2ad39baa72 Treat bucket limit like sample limit and make it fail the whole scrape and return an error
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:25:07 +08:00
Jeanette Tan
4d21ac23e6 Implement bucket limit for native histograms
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2023-04-22 03:14:19 +08:00