Guard the stale series ratio calculation by checking numSeries > 0
before computing the ratio. This prevents division by zero when
the head has no series.
Fixes#17949
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
The createAttributes error was incorrectly returning nil instead of err,
causing errors to be silently discarded. This could lead to silent data
loss for sum metrics during OTLP ingestion.
Fixes#17953
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* simplify readability of timeseries filtering by using the slices package
Signed-off-by: Callum Styan <callumstyan@gmail.com>
* ensure that BenchmarkBuildTimeSeries doesn't account for the building of
the actual proto in the benchmark results, we only care about the
buildTimeSeries call
Signed-off-by: Callum Styan <callumstyan@gmail.com>
---------
Signed-off-by: Callum Styan <callumstyan@gmail.com>
Branch protection means they cannot merge PRs to main/release branches.
Branch protection means they cannot approve things outside their area for
PRs to main/release branches.
Also add sysadmind (Joe) as ower of aws, to make sure he gets notified.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
We discussed IRL.Nico no longer has time to contribute.
This also syncs the file with CODEOWNERS.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* chore(sd-ownership): add default-maintainers as default code owner
In accordance with dev summit decision.
At the same time I've set up auto assignment for code review, meaning
that not everybody will get notified for all PRs. If there's already
a maintainer assigned, you don't get notified. Otherwise the
assignment is round-robin, 1 at a time. Also you can opt out.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Remove code owner without write access
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Add waitForQueryLog helper that polls for query log entries to appear
before asserting, rather than reading the file immediately after making
a query. This fixes a race condition where the query log wasn't flushed
to disk before the test read the file.
The helper uses a 5 second timeout with 100ms polling intervals, which
is generous enough to handle slow CI environments while keeping the test
responsive.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
tsdb: Early compaction of stale series
Closes#13616
Based on https://github.com/prometheus/proposals/pull/55
Stale series tracking was added in #16925. This PR compacts the stale series into its own block before the normal compaction hits. Here is how the settings:
stale_series_compaction_threshold: As soon as the ratio of stale series in the head block crosses StaleSeriesImmediateCompactionThreshold, TSDB performs a stale series compaction and puts all the stale series into a block and removed it from the head, but it does not remove it from the WAL. (technically this condition is checked every minute and not exactly immediate)
Additional details
WAL replay: after a stale series compaction, tombstones are added with (MinInt64, MaxInt64) for all these stale series. During WAL replay we add a special condition where when we find such tombstone, it immediately removes the series from the memory instead of storing the tombstone. This is required so that we don't spike up memory during WAL replay and also don't keep the compacted stale series in the memory.
Head block truncation ignores this block via the added metadata, similar to out-of-order blocks.
* otlptranslator: filter __name__ from OTLP attributes to prevent duplicates
OTLP metrics can have a __name__ attribute which, when combined with the
metric name passed via extras, creates duplicate __name__ labels.
This commit implements filtering out of any __name__ metric attribute from OTLP.
Also rename TestCreateAttributes to TestPrometheusConverter_createAttributes
for consistency, and add test cases for __name__, __type__, and __unit__ OTLP metric attributes.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add a test for `LeveledCompactor.Plan()` stopping after a block matches the
`BlockExcludeFilter`, as a sub-test
`TestLeveledCompactor/Plan/BlockExcludeFilter stops iteration`.
Also moving `TestLeveledCompactor_plan` to a sub-test
of `TestLeveledCompactor`, for consistency.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Add bounds check to prevent index out of range panic when
trimStringByBytes receives a string containing only UTF-8 continuation
bytes (0x80-0xBF). Previously, the loop would decrement size below 0
when no valid rune start byte was found, causing a panic.
A malicious query string with only continuation bytes could crash
the Prometheus server via the ActiveQueryTracker before the query
was parsed or validated.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
These bugs were discovered accidentally with code analysis:
- https://app.devin.ai/review/prometheus/prometheus/pull/16355
Upon further inspection and performing more analysis, 3 potential bugs were found:
1. sendloops could continue running if corresponding AM changed position in the config
2. multiple configs with the same hash would share sendloops resulting in sets without sendloops
3. sendloops could continue running if the config hash was changed
- `TestApplyConfigSendLoopsNotStoppedOnKeyChange`: Verifies sendLoops work when keys swap (no fix needed)
- `TestApplyConfigDuplicateHashSharesSendLoops`: Verifies sendLoops are independent with duplicate hashes (bug fixed)
- `TestApplyConfigHashChangeLeaksSendLoops`: Verifies sendLoops are cleaned up when hash changes (bug fixed)
Signed-off-by: Siavash Safi <siavash@cloudflare.com>
* fix(teststorage/appender.go): TODO and Sample staleness check
Allow different order of consecutive stale samples between the expected
and actual array for RequireEqual and RequireNotEqual by trying to
swap the expected side until it matches.
Also fix the definition of stale sample in the test, it's not only
float, but defined for native histograms as well.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* add unit tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
testutil.T was needed before https://go.dev/doc/go1.13#testingpkgtesting
Now it's inconsistent and confusing, so let's kill it.
Signed-off-by: bwplotka <bwplotka@gmail.com>
Update go.work from go 1.24.9 to go 1.24.0 to match the version
specified in all go.mod files across the project
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>