prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-12-03 00:21:02 +01:00

Author	SHA1	Message	Date
Björn Rabenstein	b8d19543b8	Add histogram validation in remote-read and during reducing resolution (#17561 ) ReduceResolution is currently called before validation during ingestion. This will cause a panic if there are not enough buckets in the histogram. If there are too many buckets, the spurious buckets are ignored, and therefore the error in the input histogram is masked. Furthermore, invalid negative offsets might cause problems, too. Therefore, we need to do some minimal validation in reduceResolution. Fortunately, it is easy and shouldn't slow things down. Sadly, it requires to return errors, which triggers a bunch of code changes. Even here is a bright side, we can get rud of a few panics. (Remember: Don't panic!) In different news, we haven't done a full validation of histograms read via remote-read. This is not so much a security concern (as you can throw off Prometheus easily by feeding it bogus data via remote-read) but more that remote-read sources might be makeshift and could accidentally create invalid histograms. We really don't want to panic in that case. So this commit does not only add a check of the spans and buckets as needed for resolution reduction but also a full validation during remote-read. Signed-off-by: beorn7 <beorn@grafana.com>	2025-11-21 00:22:24 +01:00
beorn7	2dfc324821	model/histogram: Make histogram bucket iterators more robust Currently, iterating over histogram buckets can panic if the spans are not consistent with the buckets. We aim for validating histograms upon ingestion, but there might still be data corruptions on disk that could trigger the panic. While data corruption on disk is really bad and will lead to all kind of weirdness, we should still avoid panic'ing. Note, though, that chunks are secured by checksums, so the corruptions won't realistically happen because of disk faults, but more likely because a chunk was generated in a faulty way in the first place, by a software bug or even maliciously. This commit prevents panics in the situation where there are fewer buckets than described by the spans. Note that the missing buckets will simply not be iterated over. There is no signalling of this problem. We might still consider this separately, but for now, I would say that this kind of corruption is exceedingly rare and doesn't deserve special treatment (which will add a whole lot of complexity to the code). Signed-off-by: beorn7 <beorn@grafana.com>	2025-11-19 16:37:51 +01:00
Grégoire	1174b0ce4f	model/textparse: Remove unit validation in protobuf parsing (#16834 ) Signed-off-by: Gregoire Verdier <gregoire.verdier@gmail.com>	2025-11-19 14:03:32 +01:00
Bartlomiej Plotka	f50ff0a40a	feat: rename CreatedTimestamp to StartTimestamp (#17523 ) Partially fixes https://github.com/prometheus/prometheus/issues/17416 by renaming all CT* names to ST* in the whole codebase except RW2 (this is done in separate [PR](https://github.com/prometheus/prometheus/pull/17411)) and PrometheusProto exposition proto. ``` CreatedTimestamp -> StartTimestamp CreatedTimeStamp -> StartTimestamp created_timestamp -> start_timestamp CT -> ST ct -> st ``` Signed-off-by: bwplotka <bwplotka@gmail.com>	2025-11-13 14:17:51 +00:00
Bryan Boreham	a57aea2915	Improve assertion failure message (#17252 ) Signed-off-by: Charles Korn <charles.korn@grafana.com> Co-authored-by: Charles Korn <charles.korn@grafana.com>	2025-11-12 11:53:32 +01:00
Ben Kochie	204249fcb5	Update golangci-lint (#17478 ) * Update golangci-lint to v2.6.0 * Fixup various linting issues. * Fixup deprecations. * Add exception for `labels.MetricName` deprecation. Signed-off-by: SuperQ <superq@gmail.com>	2025-11-05 13:47:34 +01:00
Ben Kochie	48956f60d7	Update modernize (#17471 ) Apply additional Go modernize tool improvements. Signed-off-by: SuperQ <superq@gmail.com>	2025-11-04 05:13:49 +00:00
Julius Volz	0093e2159e	Merge pull request #17337 from prometheus/ui/visualize-relabel-steps ui: Allow viewing detailed relabeling steps for each discovered target	2025-11-02 13:51:55 +01:00
Laurent Dufresne	a6793c20e8	Added tests for `histogram.Error` Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>	2025-10-30 08:47:03 +01:00
Laurent Dufresne	7621eb772c	histogram: Add `Error` type for all histogram errors `histogram.Error` becomes the generic wrapper type for all histogram errors. This makes it easier and less error prone when adding new errors to check if an error is an histogram error as well as making it less error prone to convert the errors. This change the type of those specific sentinel errors from error to `histogram.Error`, but it should almost never matter. e.g., `errors.Is(err, ErrHistogram...)` would still work out of the box. Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>	2025-10-30 08:45:34 +01:00
George Krajcsovits	37418b5910	Merge pull request #17166 from Naman-B-Parlecha/NamanParlecha/NHCBtoCH Unroll NHCBs to Classic Histograms func for RW	2025-10-30 08:44:26 +01:00
Naman-B-Parlecha	f14c515cbe	fix(histogram): handling +Inf bucket count and metric label Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-10-28 20:29:44 +05:30
György Krajcsovits	fbd5353a19	Merge remote-tracking branch 'origin/release-3.7' into krajo/merge-release-372-to-main	2025-10-22 18:02:22 +02:00
Julien Pivotto	c9d4689e0b	relabeling: Fix labelmap action validation with legacy metric name scheme Fixes #17370 In Prometheus v3.7.0, using labelmap actions with replacement patterns containing regex variables (e.g., `$1`, `${1}`) would fail validation when `metric_name_validation_scheme` was set to `legacy`, causing Prometheus to fail at startup with: "$1" is invalid 'replacement' for labelmap action This was a regression as the same configuration worked in v3.6.0. The issue was in the validation logic: while UTF-8 validation correctly allowed `$` characters, legacy validation incorrectly used `IsValidLabelName` which rejects `$` characters. The fix ensures legacy validation uses `relabelTargetLegacy` regex which explicitly supports regex template variables. Added test cases to verify labelmap validation works with both `$1` and `${1}` replacement patterns under legacy validation scheme. Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2025-10-22 10:13:06 +02:00
Linas Medžiūnas	44df626620	promql (histograms): reconcile mismatched NHCB bounds (#17278 ) Fixes #17255. The implementation happens mostly in the Add and Sub method, but the reconciliation works for all relevant operations. For example, you can now `rate` over a range wherein the custom bucket boundaries are changing. Any custom bucket reconciliation is flagged with an info-level annotation. --------- Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com> Signed-off-by: Linas Medžiūnas <linasm@users.noreply.github.com>	2025-10-18 01:03:52 +02:00
Julius Volz	8b1bd7d6c3	ui: Allow viewing detailed relabeling steps for each discovered target This adds: * A `ScrapePoolConfig()` method to the scrape manager that allows getting the scrape config for a given pool. * An API endpoint at `/api/v1/targets/relabel_steps` that takes a pool name and a label set of a target and returns a detailed list of applied relabeling rules and their output for each step. * A "show relabeling" link/button for each target on the discovery page that shows the detailed flow of all relabeling rules (based on the API response) for that target. Note that this changes the JSON encoding of the relabeling rule config struct to output the original snake_case (instead of camelCase) field names, and before merging, we need to be sure that's ok :) See my comment about that at https://github.com/prometheus/prometheus/pull/15383#issuecomment-3405591487 Fixes https://github.com/prometheus/prometheus/issues/17283 Signed-off-by: Julius Volz <julius.volz@gmail.com>	2025-10-15 15:33:27 +02:00
beorn7	ad7d1aed99	Phase out native histogram feature flag The detailed plan for this is laid out in https://github.com/prometheus/prometheus/issues/16572 . This commit adds a global and local scrape config option `scrape_native_histograms`, which has to be set to true to ingest native histograms. To ease the transition, the feature flag is changed to simply set the default of `scrape_native_histograms` to true. Further implications: - The default scrape protocols now depend on the `scrape_native_histograms` setting. - Everywhere else, histograms are now "on by default". Documentation beyond the one for the feature flag and the scrape config are deliberately left out. See https://github.com/prometheus/prometheus/pull/17232 for that. Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-15 14:50:52 +02:00
beorn7	6a8cacdf6f	model/histogram: Fix checkHistogramCustomBounds to accept -Inf Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-10 23:10:32 +02:00
Naman-B-Parlecha	1df1f53ea0	fix: Added Unroll support to Sparse NHCBs Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-10-10 19:12:30 +05:30
NamanParlecha	167cb350f1	Merge branch 'prometheus:main' into NamanParlecha/NHCBtoCH	2025-10-10 18:59:53 +05:30
beorn7	51e0982c91	promql(histograms): Fix counter reset hint handling when aggregating Fixes #17308. As explained adding the warn-annotation about conflicting counter reset hints doesn't happen consistently. Furthermore, because of incremental mean calculation being used so far (which includes subtraction), avg calculation always created gauge histograms. The fix is to make Sub behave like Add WRT counter reset handling, and then set the result of a subtraction to gauge explicitly in actual PromQL subtraction (rather than using Sub for something else, like incremental mean calculation). Also, track the presence of a CounterReset hint and a NotCounterReset hint separately for the entirety of aggregated histograms and create the warn-annotation based on that. As a minor fix, this commit also consistently creates the warn annotation in aggregation to be about "aggregation" rather than "subtraction" or "addition", because the latter are just internal operations within the aggregation, which is not of interest for the user. Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-09 19:40:00 +02:00
Björn Rabenstein	f2fc492473	Merge pull request #17284 from linasm/custom-bucket-bounds-match-fn NHCB: Separate CustomBucketBoundsMatch from FloatBucketsMatch	2025-10-07 15:38:59 +02:00
Bartlomiej Plotka	a4da440dad	fix: Fix slicelabels corruption when used with proto decoding (#17150 ) * fix: Fix slicelabels corruption when used with proto decoding Alternative to https://github.com/prometheus/prometheus/pull/16957/ Signed-off-by: bwplotka <bwplotka@gmail.com> * addressed comments Signed-off-by: bwplotka <bwplotka@gmail.com> --------- Signed-off-by: bwplotka <bwplotka@gmail.com>	2025-10-07 12:06:48 +01:00
Naman-B-Parlecha	7871bcb465	fix(convert): error message Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-10-07 14:20:32 +05:30
Naman-B-Parlecha	79f3e76d89	fix(test): Comparing the labels correctly Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-10-07 00:22:25 +05:30
Naman-B-Parlecha	c072b0000a	fix(convert): fix typos in comments Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-10-06 23:05:01 +05:30
Naman-B-Parlecha	083d0fa835	refactor(convert): updated tests and moved formatOpenMetricsFloat Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-10-06 22:56:45 +05:30
Linas Medziunas	c16db58061	NHCB: Reject custom bucket bounds with NaN value Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>	2025-10-06 16:37:28 +03:00
Linas Medziunas	8caf1f1c41	[NHCB] Separate CustomBucketBoundsMatch from floatBucketsMatch Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>	2025-10-05 22:38:07 +03:00
Bryan Boreham	968d722bb2	Merge pull request #17212 from bboreham/no-simplify [PERF] Regex: stop calling Simplify	2025-10-02 10:51:04 +01:00
beorn7	3d7cf4c274	model/histogram: Validate non-negative count and zero bucket We have always validated that none of the bucket is negative. We should do the same for the count of observations and the zero bucket. Note that this was always implied in the protobuf exposition format because a count or a zero bucket population is ignored if it is not positive. Signed-off-by: beorn7 <beorn@grafana.com>	2025-10-01 16:40:41 +02:00
Charles Korn	a2adccadd2	Improve assertion failure message Signed-off-by: Charles Korn <charles.korn@grafana.com>	2025-10-01 09:30:24 +01:00
Bryan Boreham	7056c70647	Merge pull request #16851 from jingchanglu/main chore: fix some function names in comment	2025-09-30 12:54:48 +01:00
Naman-B-Parlecha	ed67a0cbf1	refactor(histogram): rename types for clarity in histogram conversion tests Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-09-25 17:40:10 +05:30
Naman-B-Parlecha	f71f911040	fix(lint): Changing tests Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-09-25 15:28:25 +05:30
Naman-B-Parlecha	73904b4c75	refactor(histogram): Converting to Absolute values and fixing the test Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-09-25 03:42:23 +05:30
György Krajcsovits	b6df8d3274	feat(chunkenc): allow more native histograms schemas Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if above. The reduce code path will be slow, but we only expect it to happen if TSDB already has higher resolution samples and we are in a rollback. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> # Conflicts: # model/histogram/generic.go	2025-09-23 11:20:48 +02:00
György Krajcsovits	794c545930	Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation	2025-09-23 10:51:02 +02:00
Minh Nguyen	d04550a9c4	[RW2] Return 400 error code for wrongly-formatted histograms (#17210 ) * return 400 error code Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * fix Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * add more cases Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * format code Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> * nit_fixing Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com> --------- Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>	2025-09-23 07:24:46 +02:00
György Krajcsovits	5b39b79f5a	refactor error creation and tests Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-19 09:26:34 +02:00
George Krajcsovits	5e6900558a	Apply suggestions from code review Co-authored-by: Björn Rabenstein <beorn@grafana.com> Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>	2025-09-19 08:58:27 +02:00
Bryan Boreham	c743b2f3cd	[PERF] Regex: stop calling Simplify It slows down compilation and doesn't make any of our benchmarks go faster. Assumed to be something that helped at an earlier point, but doesn't help now. Add a benchmark with a more complicated regex to demonstrate the slowdown. Signed-off-by: Bryan Boreham <bjboreham@gmail.com>	2025-09-18 11:20:14 +01:00
György Krajcsovits	267be7dc20	fix(chunkenc): error out when reading unknown histogram schemas from chunks Otherwise higher level code like PromQL needs to constantly check if it can handle the samples. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-18 09:21:03 +02:00
Naman-B-Parlecha	5eeba3638d	adding comment for ConvertNHCBToClassicHistogram Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-09-17 15:48:57 +05:30
Naman-B-Parlecha	c8e3f8c97a	drop(flag): moving feature flag to other pr Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-09-17 15:32:16 +05:30
György Krajcsovits	bdf547ae9c	fix(nativehistograms): validation should fail on unsupported schemas Histogram.Validate and FloatHistogram.Validate now return error on unsupported schemas. Scrape and remote-write handler reduces the schema to the maximum allowed if it is above the maximum, but below theoretical maximum of 52. For scrape the maximum is a configuration option, for remote-write it is 8. Note: OTLP endpont already does the reduction, without checking that it is below 52 as the spec does not specify a maximum. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-13 16:54:44 +02:00
NamanParlecha	594f9d63a5	refactor(textparse): Introduce Variadic options in textParse.New (#17155 ) * refactor(textparse): introduce ParserOptions struct for cleaner parser initialization Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(fuzz): update fuzzParseMetricWithContentType to use ParserOptions Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(parser): simplify ParserOptions usage in tests and implementations Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(parse): using variadic options Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(parser): add fallbackType & SymbolTable to variadic options Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(parser): private fields Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(scrape): compose parser options Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(parser): add comments Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(parser): update to use ParserOptions struct for configuration Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(scrape): remove unused parserOptions field from scrapeLoop Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> * refactor(parser): update ParserOptions field names and add comments for clarity Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com> --------- Signed-off-by: Naman-B-Parlecha <namanparlecha@gmail.com>	2025-09-11 10:49:42 +01:00
George Krajcsovits	acd9aa0afb	fix(textparse/protobuf): metric family name corrupted by NHCB parser (#17156 ) * fix(textparse): implement NHCB parsing in ProtoBuf parser directly The NHCB conversion does some validation, but we can only return error from Parser.Next() not Parser.Histogram(). So the conversion needs to happen in Next(). There are 2 cases: 1. "always_scrape_classic_histograms" is enabled, in which case we convert after returning the classic series. This is to be consistent with the PromParser text parser, which collects NHCB while spitting out classic series; then returns the NHCB. 2. "always_scrape_classic_histograms" is disabled. In which case we never return the classic series. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * refactor(textparse): skip classic series instead of adding NHCB around Do not return the first classic series from the EntryType state, switch to EntrySeries. This means we need to start the histogram field state from -3 , not -2. In EntrySeries, skip classic series if needed. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * reuse nhcb converter Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * test(textparse/nhcb): test corrupting metric family name NHCB parse doesn't always copy the metric name from the underlying parser. When called via HELP, UNIT, the string is directly referenced which means that the read-ahead of NHCB can corrupt it. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-08 17:26:41 +02:00
Arve Knudsen	913cc8f72b	Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2 (#17151 ) * Replace gopkg.in/yaml.v2 with go.yaml.in/yaml/v2 * Upgrade to client_golang@v1.23.2 --------- Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2025-09-06 13:04:24 +02:00
George Krajcsovits	31e4d84edd	refactor(textparse): allow for parsers with direct NHCB support (#17153 ) Hide adding NHCB parser on top another parser in New() function so we can easily add direct NHCB capable parsers. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2025-09-06 11:45:44 +02:00

1 2 3 4 5 ...

593 Commits