127 Commits

Author SHA1 Message Date
Bartlomiej Plotka
8e6b008608
feature: type-and-unit-labels (PROM-39 implementation) (#16228)
* feature: type-and-unit-labels (extended MetricIdentity)

Experimental implementation of https://github.com/prometheus/proposals/pull/39

Previous (unmerged) experiments:
* https://github.com/prometheus/prometheus/compare/main...dashpole:prometheus:type_and_unit_labels
* https://github.com/prometheus/prometheus/pull/16025

Signed-off-by: bwplotka <bwplotka@gmail.com>

feature: type-and-unit-labels (extended MetricIdentity)

Experimental implementation of https://github.com/prometheus/proposals/pull/39

Previous (unmerged) experiments:
* https://github.com/prometheus/prometheus/compare/main...dashpole:prometheus:type_and_unit_labels
* https://github.com/prometheus/prometheus/pull/16025

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Fix compilation errors

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Lint

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Revert change made to protobuf 'Accept' header

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

Fix compilation errors for 'dedupelabels' tag

Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>

* Rectored into schema.Metadata

Signed-off-by: bwplotka <bwplotka@gmail.com>

* texparse: Added tests for PromParse

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add OM tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add proto tests

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add schema label tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* fix tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* add promql tests.

Signed-off-by: bwplotka <bwplotka@gmail.com>

* lint

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Addressed comments.

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
2025-05-17 09:37:25 +00:00
hardlydearly
ba4b058b7a refactor: use slices.Contains to simplify code
Signed-off-by: hardlydearly <799511800@qq.com>
2025-05-09 08:27:10 +02:00
Arve Knudsen
e7e3ab2824
Fix linting issues found by golangci-lint v2.0.2 (#16368)
* Fix linting issues found by golangci-lint v2.0.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-05-03 19:05:13 +02:00
Bryan Boreham
ca416c580c
Merge branch 'main' into slicelabels
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-05-02 10:31:57 +01:00
Bryan Boreham
b2c2146d7c
Labels: simpler/faster stringlabels encoding (#16069)
Instead of using varint to encode the size of each label, use a single
byte for size 0-254, or a flag value of 255 followed by the size in
3 bytes little-endian.

This reduces the amount of code, and also the number of branches in
commonly-executed code, so it runs faster.

The maximum allowed label name or value length is now 2^24 or 16MB.

Memory used by labels changes as follows:
* Labels from 0 to 127 bytes length: same
* From 128 to 254: 1 byte less
* From 255 to 16383: 2 bytes more
* From 16384 to 2MB: 1 byte more
* From 2MB to 16MB: same

Labels: panic on string too long.

Slightly more user-friendly than encoding bad data and finding out when
we decode.

Clarify that Labels.Bytes() encoding can change

---------

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-04-30 10:53:48 +01:00
Lukasz Mierzwa
05088aaa12 Fix linter errors
Mostly comment issues and unused variables.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-04-15 18:04:41 +01:00
Lukasz Mierzwa
bb76966992 Use stringlabels by default
This removes the stringlabels build tag, makes that implementation the default one, and moves the old labels implementation under the slicelabels build tag.
Fixes #16064.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-04-15 17:52:24 +01:00
wellweek
4e91f13db2 refactor: use slices.Equal to simplify code
Signed-off-by: wellweek <xiezitai@outlook.com>
2025-03-27 12:17:35 +01:00
Owen Williams
94b43c5d4c utf8: Remove support for legacy global validation setting
Global and Data Source configurations can specify legacy mode, but Prometheus now requires that the overall validation mode be set to UTF-8

Signed-off-by: Owen Williams <owen.williams@grafana.com>
2025-03-13 10:47:24 -04:00
Matthieu MOREL
c7d4b53ec1 chore: enable unused-parameter from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-19 19:50:28 +01:00
Bryan Boreham
ac4f8a5e23
[ENHANCEMENT] TSDB: Improve calculation of space used by labels (#13880)
* [ENHANCEMENT] TSDB: Improve calculation of space used by labels

The labels for each series in the Head take up some some space in the
Postings index, but far more space in the `memSeries` structure.

Instead of having the Postings index calculate this overhead, which is
a layering violation, have the caller pass in a function to do it.

Provide three implementations of this function for the three Labels
versions.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-16 09:42:52 +00:00
Owen Williams
8d4bcd2c77 promql: Fix various UTF-8 bugs related to quoting
Fixes UTF-8 aggregator label list items getting mutated with quote marks when String-ified.
Fixes quoted metric names not supported in metric declarations.
Fixes UTF-8 label names not being quoted when String-ified.

Fixes https://github.com/prometheus/prometheus/issues/15470
Fixes https://github.com/prometheus/prometheus/issues/15528

Signed-off-by: Owen Williams <owen.williams@grafana.com>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-04 14:18:59 -05:00
Arve Knudsen
89bbb885e5
Upgrade to golangci-lint v1.62.0 (#15424)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-11-20 17:22:20 +01:00
Bryan Boreham
5571c7dc98 FastRegexMatcher: use stack memory for lowercase copy of string
Up to 32-byte values this saves garbage, runs faster.
For prefixes, only `toLower` the part we need for the map lookup.

Split toNormalisedLower into fast and slow paths, to avoid a penalty
for the `copy` call in the case where no allocations are done.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-10-28 16:28:58 +00:00
Bryan Boreham
31c5760551
Neater string vs byte-slice conversions (#14425)
unsafe.Slice and unsafe.StringData were added in Go 1.20

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-21 12:19:21 +02:00
Mario Fernandez
5814920601
Fix: optimize .* regexp performance
Shortcut for `.*` matches newlines as well.
Add preamble change ^(?s:
Add test
dotAll flag por al regex
Add and fix regex tests

Signed-off-by: Mario Fernandez <mariofer@redhat.com>
2024-09-17 12:18:31 +02:00
Owen Williams
9da75328ea
fix(utf8): ensure correct validation when legacy mode turned on (#14736)
fix(utf8): ensure correct validation when legacy mode turned on

This depends on the included update of the prometheus/common dependency.

---------

Signed-off-by: Owen Williams <owen.williams@grafana.com>
2024-08-28 17:15:42 +02:00
beorn7
0f760f63dd lint: Revamp our linting rules, mostly around doc comments
Several things done here:

- Set `max-issues-per-linter` to 0 so that we actually see all linter
  warnings and not just 50 per linter. (As we also set
  `max-same-issues` to 0, I assume this was the intention from the
  beginning.)

- Stop using the golangci-lint default excludes (by setting
  `exclude-use-default: false`. Those are too generous and don't match
  our style conventions. (I have re-added some of the excludes
  explicitly in this commit. See below.)

- Re-add the `errcheck` exclusion we have used so far via the
  defaults.

- Exclude the signature requirement `govet` has for `Seek` methods
  because we use non-standard `Seek` methods a lot. (But we keep other
  requirements, while the default excludes completely disabled the
  check for common method segnatures.)

- Exclude warnings about missing doc comments on exported symbols. (We
  used to be pretty adamant about doc comments, but stopped that at
  some point in the past. By now, we have about 500 missing doc
  comments. We may consider reintroducing this check, but that's
  outside of the scope of this commit. The default excludes of
  golangci-lint essentially ignore doc comments completely.)

- By stop using the default excludes, we now get warnings back on
  malformed doc comments. That's the most impactful change in this
  commit. It does not enforce doc comments (again), but _if_ there is
  a doc comment, it has to have the recommended form. (Most of the
  changes in this commit are fixing this form.)

- Improve wording/spelling of some comments in .golangci.yml, and
  remove an outdated comment.

- Leave `package-comments` inactive, but add a TODO asking if we
  should change that.

- Add a new sub-linter `comment-spacings` (and fix corresponding
  comments), which avoids missing spaces after the leading `//`.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-08-22 17:36:11 +02:00
Bryan Boreham
d84282b105 Labels: use single byte as separator - small speedup
Since `seps` is a variable, `seps[0]` has to be bounds-checked every
time. Replacing with a constant everywhere it is used skips this
overhead.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-07-15 09:47:16 +01:00
Bryan Boreham
82a8c6abe2
[ENHANCEMENT] Optimize regexps with multiple prefixes (#13843)
For example `foo.*|bar.*|baz.*`. Instead of checking each one in turn,
we build a map of prefixes, then check the smaller set that could match
the string supplied.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Improve testing and readability

Address review comments on #13843

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-07-03 18:45:36 +01:00
Marco Pracucci
ec31acaf02
FastRegexMatcher: small optimization for the literal prefix case
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-07-01 10:12:50 +02:00
Bryan Boreham
7a82e4b503 Labels benchmarks: remove artefact of small symbol-tables
Symbol tables with fewer than 128 entries, so everything can be
represented as a single byte, are not realistic.

Stuff the symbol table with fake entries before adding the real ones.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-06-21 16:49:10 +01:00
Bryan Boreham
2ba7bc9446 Labels: further optimisation for dedupelabels
Inline (by copy-paste) the fast path of `decodeVarint` in various
places where it gets called a lot.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-06-21 16:46:13 +01:00
Bryan Boreham
2ced2f6aec [PERF] Labels: faster varint for dedupelabels
Including tests.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-06-21 11:57:09 +01:00
Bryan Boreham
84602bbace
Merge branch 'main' into fix-matcher-string-with-empty-label-name 2024-06-19 05:56:25 -04:00
Oleg Zaytsev
4f78cc809c
Refactor toNormalisedLower: shorter and slightly faster. (#14299)
Refactor toNormalisedLower: shorter and slightly faster

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-18 09:57:37 +00:00
Oleg Zaytsev
03cf6141d4
Fix Matcher.String() with empty label name
When the label name is empty, which can happen now with quoted label
name, it should be quoted when printed as a string again.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-13 18:46:35 +02:00
Ranveer Avhad
39902ba694
[BUGFIX] FastRegexpMatcher: do Unicode normalization as part of case-insensitive comparison (#14170)
* Converted string to standarized form
* Added golang.org/x/text in Go dependencies
* Added test cases for FastRegexMatcher
* Added benchmark for toNormalizedLower

Signed-off-by: RA <ranveeravhad777@gmail.com>
2024-06-10 18:31:41 -04:00
Marco Pracucci
d966ae6400
Optimize containsInOrder() inlining it
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-06-04 10:34:15 +02:00
Marco Pracucci
a0807733be
Improved tests
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-06-04 10:34:15 +02:00
Marco Pracucci
78fdd2188d
Improve contains check done by FastRegexMatcher
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-06-04 10:34:15 +02:00
Bryan Boreham
1e0b0e250a
Merge pull request #14090 from colega/improve-zeroOrOneCharacterStringMatcher-Matches
Improve `zeroOrOneCharacterStringMatcher` by using `utf8.DecodeRuneInString`
2024-05-16 09:28:53 +01:00
Oleksandr Redko
f10c3454e9 Enable perfsprint linter and fix up code
Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>
2024-05-15 17:51:05 +03:00
Oleg Zaytsev
8b4c9459a2
Check utf8.RuneError result
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-13 17:44:26 +02:00
Oleg Zaytsev
dbe88fae22
Add invalid utf8 test cases to regexp
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-13 17:05:31 +02:00
Oleg Zaytsev
bcff5059e6
Use utf8.DecodeRuneInString(s)
This replaces the custom `moreThanOneRune` function with the standard
`utf8.DecodeRuneInString(s)` that can be used to figure out the size of
the first rune.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-13 15:41:00 +02:00
Oleg Zaytsev
fdfc6d4725
Benchmark zeroOrOneCharacterStringMatcher.Matches
This adds some more test cases for unicode values, and also a benchmark
for zeroOrOneCharacterStringMatcher.Matches()

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-13 15:36:55 +02:00
Oleg Zaytsev
b7b4355807
Use bytes.Buffer from stack buf in Matcher.String()
Also removed the growing until there's a benchmark for that.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-09 10:00:24 +02:00
Oleg Zaytsev
6ebda5a7bc
Optimize Matcher.String()
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-08 17:05:27 +02:00
Oleg Zaytsev
dabd789fd5
Quote label name in matchers when needed
When the label name of a matcher contains non-standard characters, like
a dot, or starts with a digit, it should be quoted.

If it's not quoted, then `VectorSelector.String()` isn't a valid PromQL.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-08 16:58:51 +02:00
Oleg Zaytsev
2524a91591
Fix FastRegexMatcher matching multibyte runes with . (#14059)
When `zeroOrOneCharacterStringMatcher` wach checking the input string,
it assumed that if there are more than one bytes, then there are more
than one runes, but that's not necessarily true.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-05-07 16:33:37 +02:00
Matthieu MOREL
d496687c8e golangci-lint: enable usestdlibvars linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-08 19:26:23 +00:00
Bryan Boreham
7c28521451 [TESTS] Truncate some long test names, for readability
The strings produced by these tests can run to thousands of characters,
which makes test logs difficult to read.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-04-03 10:10:39 +01:00
carehabit
a672662073
all: fix some typos (#13863)
Signed-off-by: carehabit <shenyuting@outlook.com>
2024-04-01 18:06:05 +02:00
Domantas
435f330d0b
[BUGFIX] labels: don't modify original labels in DropMetricName (#13845)
Restrict the capacity of first argument to `append()` to force an allocation.
This is for the slice implementation only.

Signed-off-by: Domantas Jadenkus <djadenkus@gmail.com>
2024-03-27 10:35:17 +00:00
Bryan Boreham
48786ad4e8 Use slices insteda of exp/slices
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-25 12:20:18 +00:00
Bryan Boreham
080d440bf8 Merge remote-tracking branch 'origin/main' into pr/13461 2024-03-25 12:14:26 +00:00
Oleg Zaytsev
d12e785075
Improve readability
As suggested by @bboreham

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-03-18 11:16:09 +01:00
Oleg Zaytsev
9699598952
Improve Labels.Compare performance w/stringlabels
I was bored on a train and I spent some amount of time trying to scratch
some nanoseconds off the Labels.Compare when running with stringlabels.

I would be ashamed to admit the real amount of time I spent on it.

The worst thing is, I can't really explain why this is performing so
much better, and someone should re-run the benchmarks on their machine
to confirm that it's not something related to general relativity because
the train is moving. I also added some extra real-life benchmark cases
with longer labelsets (these aren't the longest we have in production,
but kubernetes labelsets are fairly common in Prometheus so I thought it
would be nice to have them).

My benchmarks show this diff:

goos: darwin
goarch: arm64
pkg: github.com/prometheus/prometheus/model/labels
                                       │     old     │                 new                 │
                                       │   sec/op    │   sec/op     vs base                │
Labels_Compare/equal                     5.898n ± 0%   5.875n ± 1%   -0.40% (p=0.037 n=10)
Labels_Compare/not_equal                 11.78n ± 2%   11.01n ± 1%   -6.54% (p=0.000 n=10)
Labels_Compare/different_sizes           4.959n ± 1%   4.906n ± 2%   -1.05% (p=0.050 n=10)
Labels_Compare/lots                      21.32n ± 0%   17.54n ± 5%  -17.75% (p=0.000 n=10)
Labels_Compare/real_long_equal           15.06n ± 1%   14.92n ± 0%   -0.93% (p=0.000 n=10)
Labels_Compare/real_long_different_end   25.20n ± 0%   24.43n ± 0%   -3.04% (p=0.000 n=10)
geomean                                  11.86n        11.25n        -5.16%

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-03-17 17:08:06 +01:00
Bryan Boreham
0bb5588386
labels: optimize String method (#13673)
Use a stack buffer to reduce memory allocations.

`Write(AppendQuote(AvailableBuffer` does not allocate or copy when
the buffer has sufficient space.

Also add a benchmark, with some refactoring.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-03-12 11:34:03 +00:00