* Adding scape on shutdown
Signed-off-by: avilevy <avilevy@google.com>
* scrape: replace skipOffsetting to make the test offset deterministic instead of skipping it entirely
Signed-off-by: avilevy <avilevy@google.com>
* renamed calculateScrapeOffset to getScrapeOffset
Signed-off-by: avilevy <avilevy@google.com>
* test(scrape): refactor time-based manager tests to use synctest
Addresses PR feedback to remove flaky, time-based sleeping in the scrape manager tests.
Add TestManager_InitialScrapeOffset and TestManager_ScrapeOnShutdown to use the testing/synctest package, completely eliminating real-world time.Sleep delays and making the assertions 100% deterministic.
- Replaced httptest.Server with net.Pipe and a custom startFakeHTTPServer helper to ensure all network I/O remains durably blocked inside the synctest bubble.
- Leveraged the skipOffsetting option to eliminate random scrape jitter, making the time-travel math exact and predictable.
- Using skipOffsetting also safely bypasses the global singleflight DNS lookup in setOffsetSeed, which previously caused cross-bubble panics in synctest.
- Extracted shared boilerplate into a setupSynctestManager helper to keep the test cases highly readable and data-driven.
Signed-off-by: avilevy <avilevy@google.com>
* Clarify use cases in InitialScrapeOffset comment
Signed-off-by: avilevy <avilevy@google.com>
* test(scrape): use httptest for mock server to respect context cancellation
- Replaced manual HTTP string formatting over `net.Pipe` with `httptest.NewUnstartedServer`.
- Implemented an in-memory `pipeListener` to allow the server to handle `net.Pipe` connections directly. This preserves `synctest` time isolation without opening real OS ports.
- Added explicit `r.Context().Done()` handling in the mock HTTP handler to properly simulate aborted requests and scrape timeouts.
- Validates that the request context remains active and is not prematurely cancelled during `ScrapeOnShutdown` scenarios.
- Renamed `skipOffsetting` to `skipJitterOffsetting`.
- Addressed other PR comments.
Signed-off-by: avilevy <avilevy@google.com>
* tmp
Signed-off-by: bwplotka <bwplotka@gmail.com>
* exp2
Signed-off-by: bwplotka <bwplotka@gmail.com>
* fix
Signed-off-by: bwplotka <bwplotka@gmail.com>
* scrape: fix scrapeOnShutdown context bug and refactor test helpers
The scrapeOnShutdown feature was failing during manager shutdown because
the scrape pool context was being cancelled before the final shutdown
scrapes could execute. Fix this by delaying context cancellation
in scrapePool.stop() until after all scrape loops have stopped.
In addition:
- Added test cases to verify scrapeOnShutdown works with InitialScrapeOffset.
- Refactored network test helper functions from manager_test.go to
helpers_test.go.
- Addressed other comments.
Signed-off-by: avilevy <avilevy@google.com>
* Update scrape/scrape.go
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
---------
Signed-off-by: avilevy <avilevy@google.com>
Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: avilevy18 <105948922+avilevy18@users.noreply.github.com>
Co-authored-by: bwplotka <bwplotka@gmail.com>
Remove the separate scrapeFailureLoggerMtx and use targetMtx instead
for synchronizing access to scrapeFailureLogger. This fixes a data race
where Sync() would read scrapeFailureLogger while holding targetMtx but
SetScrapeFailureLogger() would write to it while holding a different mutex.
Add regression test to catch concurrent access issues.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
* fix(scrape): use HonorLabels instead of HonorTimestamps in newScrapeLoop
The sampleMutator closure in newScrapeLoop was incorrectly passing
HonorTimestamps to mutateSampleLabels instead of HonorLabels. This
caused honor_labels configuration to be ignored, with the behavior
incorrectly controlled by honor_timestamps instead.
Adding TestNewScrapeLoopHonorLabelsWiring integration test that exercises
the real newScrapeLoop constructor with HonorLabels and HonorTimestamps
set to opposite values to catch this class of wiring bug.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Update scrape/scrape_test.go
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Add honor_labels=false test case
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
This adds the following native histograms (with a few classic buckets for backwards compatibility), while keeping the corresponding summaries (same name, just without `_histogram`):
- `prometheus_sd_refresh_duration_histogram_seconds`
- `prometheus_rule_evaluation_duration_histogram_seconds`
- `prometheus_rule_group_duration_histogram_seconds`
- `prometheus_target_sync_length_histogram_seconds`
- `prometheus_target_interval_length_histogram_seconds`
- `prometheus_engine_query_duration_histogram_seconds`
Signed-off-by: Harsh <harshmastic@gmail.com>
Signed-off-by: harsh kumar <135993950+hxrshxz@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.
```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st
```
Signed-off-by: bwplotka <bwplotka@gmail.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
Hide adding NHCB parser on top another parser in New() function
so we can easily add direct NHCB capable parsers.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
This was added in Go 1.21, and is neater than a loop deleting all
elements.
Also move the comment noting why we do this, because it could be read
as saying this is the only reason we have two maps.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Instead of the labels hash, which could collide between two different
series, use the SeriesRef which is unique.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
Go will not allocate when reading from a map with a key cast from []byte to string.
Also remove some yoloString calls in package `textparse` - call a more suitable library function.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Scrape cache is used to emit StaleNaN markers after a series disappears so it should only hold entries for series that did end up in TSDB, which is not always the case due to sample_limit.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Currently all staleness markers are appended for any sample that disappears from scrape cache, even if that sample was never appended to TSDB.
When staleness markers are appended they always use ref=0 as the SeriesRef, so the downstream appender doesn't know if the sample is for a know series or not.
This changes the scrape cache so the map used for staleness tracking stores the cache entry instead of only the label set. Having the cache entry means:
- we can ignore stale samples that didn't end up in TSDB (not in the scrape cache)
- we can append them to TSDB using correct ref value, so the appender knows if they are for know or unknown series
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Addresses https://github.com/prometheus/prometheus/issues/16371
This will help with migrating to native histograms with `convert_classic_histograms_to_nhcb` since users may still need to keep the classic histograms during a migration
Signed-off-by: chardch <otwordsne@gmail.com>
The new metric_name_escaping_scheme config option works in parallel with metric_name_validation_scheme and controls which escaping scheme is requested when scraping. When not specified, the scheme will request underscores if the validation scheme is set to legacy, and will request allow-utf-8 when the validation scheme is set to utf8. This setting allows users to allow utf8 names if they like, but explicitly request an escaping scheme rather than UTF-8.
Fixes https://github.com/prometheus/prometheus/issues/16034
Built on https://github.com/prometheus/prometheus/pull/16080
Signed-off-by: Owen Williams <owen.williams@grafana.com>
No need to validate, track, add sample, add exemplar if series isn't
going to be ingested. Basically this is the same as if this series was
dropped by relabelling.
Fixes#16217
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
We use the cache iteration number to detect when the same metric has
occurred twice in a scrape. We need to bump this number at the end of
every scrape, not just successful scrapes.
The `iter` number is also used:
* After a successful scrape, to delete older metrics and metadata.
* To detect when metadata changed in this scrape.
None of those additional cases is broken by incrementing the number
on error.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
..instead of *int64. This is as an optimization and ease of use. We already
accepted in many places (proto histograms, PRW) that CT (or any timestamp really) 0
means not set.
Signed-off-by: bwplotka <bwplotka@gmail.com>
Change case order for switch scrapeLoop
This commit changes the ordering in error identification switch cases for better production performance and adds reasonings behind error switch case order as comments.
---------
Signed-off-by: Laimis Juzeliūnas <asnelaimis@gmail.com>