843 Commits

Author SHA1 Message Date
TJ Hoplock
6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
Julien
9d8f93ea92
Merge pull request #15113 from yeya24/add-missing-allow-overlapping-compaction-flag
Add missing flag storage.tsdb.allow-overlapping-compaction
2024-10-07 11:17:24 +02:00
Matthieu MOREL
ab64966e9d
fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()" (#15094)
* fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()"

---------

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-10-06 16:35:29 +00:00
Ben Ye
6c3d11629b add missing flag storage.tsdb.allow-overlapping-compaction
Signed-off-by: Ben Ye <benye@amazon.com>
2024-10-05 17:05:42 -07:00
Julien
297667469a
Merge pull request #15090 from roidelapluie/querylogrule
Fix flakiness of QueryLogTest
2024-10-04 14:57:24 +02:00
Julien
9d275c23de cmd/prometheus: Fix flakiness of QueryLogTest
Now we check that a rule execution has taken place.

This also reduces the time to run the rules tests from 45s to 25s.

Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-10-04 11:23:57 +02:00
Julien
21e0f83b68 Move notifications in utils
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-10-04 10:11:56 +02:00
Julien
b6158e8956 Notify web UI when starting up and shutting down
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-10-03 10:27:04 +02:00
Jesus Vazquez
77d3b3aff3
OTLP: Remove experimental word form OTLP receiver (#14894)
The OTLP receiver can now considered stable. We've had it for longer
than a year in main and has received constant improvements.

Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
2024-10-01 14:36:52 +02:00
Julien
f9bbad1148 Limit the number of SSE Subscribers to 16 by default
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-09-27 15:51:51 +02:00
Julien
7aa4721373
Merge pull request #14946 from roidelapluie/notifications
Add notifications to the Web UI
2024-09-27 15:50:43 +02:00
Julien
6cde0096e2 Add notifications to the web UI when configuration reload fails.
This commit introduces a new `/api/v1/notifications/live` endpoint that
utilizes Server-Sent Events (SSE) to stream notifications to the web UI.
This is used to display alerts such as when a configuration reload
has failed.

I opted for SSE over WebSockets because SSE is simpler to implement and
more robust for our use case. Since we only need one-way communication
from the server to the client, SSE fits perfectly without the overhead
of establishing and maintaining a two-way WebSocket connection.

When the SSE connection fails, we go back to a classic
/api/v1/notifications API endpoint.

This commit also contains the required UI changes for the new Mantine UI.

Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-09-27 15:28:38 +02:00
Bryan Boreham
410fcce6f0
Remove unnecessary pprof import (#14988)
The pattern of `import _ "net/http/pprof"` adds handlers to the default
http handler, but Prometheus does not use that. There are explicit
handlers in `web/web.go`.

So, we can remove this line with no impact to behaviour.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-27 07:45:49 +01:00
Björn Rabenstein
f74722841b
Merge pull request #14160 from alex-kattathra-johnson/issue-13959
Remove no-default-scrape-port featureFlag
2024-09-26 18:45:56 +02:00
Arthur Silva Sens
6bd9b1a7cc
Histogram CT Zero ingestion
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-09-26 11:29:22 -03:00
Alex Johnson
be0f10054e Remove no-default-scrape-port featureFlag
Signed-off-by: Alex Johnson <alex.kattathra.johnson@gmail.com>
2024-09-25 10:13:19 -05:00
Julien
919648cafc
Merge pull request #14947 from roidelapluie/reloadinvalidyaml
fix(autoreload): Reload invalid yaml files
2024-09-23 10:03:23 +02:00
Björn Rabenstein
5b9148e552
Merge pull request #14820 from charleskorn/promqltest-native-histogram-format
promqltest: use test expression format for histograms in assertion failure messages and include reset hint in the test expression
2024-09-20 16:47:08 +02:00
Julien
146b22d196 fix(autoreload): Reload invalid yaml files
When a YAML file is invalid, trigger auto-reload anyway so that user is
aware that the configuration file is incorrect.

Failing to do so does not change the reload status in metrics and api.

Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-09-20 13:38:05 +02:00
Julius Volz
b8d1336d42
Merge pull request #14912 from roidelapluie/notready
mantine UI: Distinguish between Not Ready and Stopping
2024-09-17 19:40:13 +02:00
Julien
ac5377873f mantine UI: Distinguish between Not Ready and Stopping
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-09-17 16:02:16 +02:00
Carrie Edwards
14e3c05ce8
tsdb: Add support for ingestion of out-of-order native histogram samples (#14546)
Add support for ingesting OOO native histograms

* Add flag for enabling and disabling OOO native histogram ingestion
* Update OOO querying tests to include native histogram samples
* Add OOO head tests
* Add test for OOO native histogram counter reset headers

Signed-off-by: Carrie Edwards <edwrdscarrie@gmail.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored by: Carrie Edwards <edwrdscarrie@gmail.com>
Co-authored by: Jeanette Tan <jeanette.tan@grafana.com>
Co-authored by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored by: Fiona Liao <fiona.liao@grafana.com>
2024-09-17 11:19:06 +02:00
Augustin Husson
6febfbb3be
put back feature flag 'delayed-compaction' and 'old-ui' (#14909)
Signed-off-by: Augustin Husson <husson.augustin@gmail.com>
2024-09-16 15:46:28 +02:00
Nathan Baulch
50cd453c8f
chore: Fix typos (#14868)
* Fix typos

---------

Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>
2024-09-10 22:32:03 +02:00
Jan Fajerski
fa318711f4 Merge branch 'main' into 3.0-main-sync-24-09-09
Conflicts:
	cmd/prometheus/main.go
	docs/command-line/prometheus.md
	docs/feature_flags.md
	web/ui/build_ui.sh
	web/web.go
    Resolved by dropping the UTF-8 feature flag and adding the
    `auto-reload-config` feature flag.
    For the new web ui pick all changes from `main`.
2024-09-09 15:44:22 +02:00
Julius Volz
ad21ef3d76 Merge branch 'main' into mantine-ui
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2024-09-06 21:26:00 +02:00
Björn Rabenstein
694d98032b
Merge pull request #14705 from prometheus/owilliams/default-on
utf8: enable utf-8 support by default
2024-09-06 17:24:37 +02:00
Owen Williams
88bb05c3e8 utf8: enable utf-8 support by default
This change causes Prometheus to allow all UTF-8 characters in metric and label names.
This means that names that were previously invalid and would have been previously rejected will be allowed through.

Signed-off-by: Owen Williams <owen.williams@grafana.com>
2024-09-06 08:48:11 -04:00
Julien
404e577034
Merge pull request #14734 from roidelapluie/scrape_failure_logger
Add support for logging scrape failures to a specified file
2024-09-06 11:16:50 +02:00
machine424
cc40b65ab4 fix(promtool): use the final database path for --sandbox-dir-root instead of the default value as it may be overridden
add a regression test for that.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-09-05 15:25:32 +02:00
machine424
d18fa62ae9
chore(discovery): enable new-service-discovery-manager by default and drop legacymanager package
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-09-05 12:46:03 +02:00
Julien
ce0f09b125 Scrape: Add scrape_failure_log_file to log Scrape Failures
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-09-05 11:01:40 +02:00
Julien
50f5327f83
Merge pull request #14769 from roidelapluie/autoreload
Support reload config automatically
2024-09-05 10:47:07 +02:00
Julius Volz
c73c3e24d7 Default to serving new (Prometheus 3.0) UI
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2024-09-04 21:33:37 +02:00
Julius Volz
490eef6509 Merge branch 'main' into mantine-ui
Signed-off-by: Julius Volz <julius.volz@gmail.com>
2024-09-04 20:58:19 +02:00
Arthur Silva Sens
ca1f30bb6e Promote Agent mode to it's own cmdline flag
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-09-04 20:06:35 +02:00
Jan Fajerski
befcfadf78 Fix merge conflicts
Fix call to newTestEngine(t) in promql/engine_test.go:3214.
`agent` feature-flag it's own cmdline flag now.
Remove `scrape.name-escaping-scheme` argument.

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2024-09-04 20:06:17 +02:00
Jan Fajerski
fe4289b502 Merge branch 'main' into HEAD 2024-09-04 18:50:00 +02:00
Charles Korn
4da551578c
Fix test broken by inclusion of counter_reset_hint
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-09-04 16:33:18 +10:00
Julien
cfe8b5616f Only update hash when config reload succeeds
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-09-03 11:36:58 +02:00
Arve Knudsen
5dfbcc390e Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-09-02 16:26:59 +02:00
Jan Fajerski
00315ce15e Merge branch 'main' into 3.0-main-sync-24-08-30
using -Xours

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2024-09-02 11:27:18 +02:00
Julien
1cd2d0498b Support reload config automatically
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-08-30 17:12:44 +02:00
Jorge Creixell
e9e3d64b7c
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation (#14477)
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation

  - This change allows optionally preserving the `__name__` label via the `label_replace` and `label_join` functions, and helps prevent the dreaded "vector cannot contain metrics with the same labelset" error.
  - The implementation extends the `Series` and `Sample` structs with a boolean flag indicating whether the `__name__` label should be deleted at the end of the query evaluation.
  - The `label_replace` and `label_join` functions can still access the value of the `__name__` label, even if it has been previously marked for deletion. If  `__name__` is used as target label, it won't be dropped at the end of the query evaluation.
  - Fixes https://github.com/prometheus/prometheus/issues/11397
  - See https://github.com/jcreixell/prometheus/pull/2 for previous discussion, including the decision to create this PR and benchmark it before considering other alternatives (like refactoring `labels.Labels`).
  - See https://github.com/jcreixell/prometheus/pull/1 for an alternative implementation using a special label instead of boolean flags.
  - Note: a feature flag  `promql-delayed-name-removal` has been added as it changes the behavior of some "weird" queries (see https://github.com/prometheus/prometheus/issues/11397#issuecomment-1451998792)

Example (this always fails, as `__name__` is being dropped by `count_over_time`):

```
count_over_time({__name__!=""}[1m])

=> Error executing query: vector cannot contain metrics with the same labelset
```

Before:

```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")

=> Error executing query: vector cannot contain metrics with the same labelset
```

After:

```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")

=>
count_go_gc_cycles_automatic_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
count_go_gc_cycles_forced_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
...
```

Signed-off-by: Jorge Creixell <jcreixell@gmail.com>

---------

Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
2024-08-29 15:50:39 +02:00
Arve Knudsen
99204f23ee Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-29 09:52:54 +02:00
Suraj Patil
7757794bb3
[ENHANCEMENT] Promtool: Adding labels to time series while creating tsdb blocks (#14403)
* feat: #14402 - Adding labels to time series while creating tsdb blocks

Signed-off-by: Suraj Patil <patilsuraj767@gmail.com>
2024-08-28 12:12:24 +10:00
Arve Knudsen
c9a460d570 Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-26 12:17:10 +02:00
Julien
349068ad3e
Merge pull request #14665 from roidelapluie/multiple-listening-addresses
Add support for multiple listening addresses
2024-08-26 10:31:23 +02:00
beorn7
0f760f63dd lint: Revamp our linting rules, mostly around doc comments
Several things done here:

- Set `max-issues-per-linter` to 0 so that we actually see all linter
  warnings and not just 50 per linter. (As we also set
  `max-same-issues` to 0, I assume this was the intention from the
  beginning.)

- Stop using the golangci-lint default excludes (by setting
  `exclude-use-default: false`. Those are too generous and don't match
  our style conventions. (I have re-added some of the excludes
  explicitly in this commit. See below.)

- Re-add the `errcheck` exclusion we have used so far via the
  defaults.

- Exclude the signature requirement `govet` has for `Seek` methods
  because we use non-standard `Seek` methods a lot. (But we keep other
  requirements, while the default excludes completely disabled the
  check for common method segnatures.)

- Exclude warnings about missing doc comments on exported symbols. (We
  used to be pretty adamant about doc comments, but stopped that at
  some point in the past. By now, we have about 500 missing doc
  comments. We may consider reintroducing this check, but that's
  outside of the scope of this commit. The default excludes of
  golangci-lint essentially ignore doc comments completely.)

- By stop using the default excludes, we now get warnings back on
  malformed doc comments. That's the most impactful change in this
  commit. It does not enforce doc comments (again), but _if_ there is
  a doc comment, it has to have the recommended form. (Most of the
  changes in this commit are fixing this form.)

- Improve wording/spelling of some comments in .golangci.yml, and
  remove an outdated comment.

- Leave `package-comments` inactive, but add a TODO asking if we
  should change that.

- Add a new sub-linter `comment-spacings` (and fix corresponding
  comments), which avoids missing spaces after the leading `//`.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-08-22 17:36:11 +02:00
Jan Fajerski
5138922b0d Merge branch 'main' into 3.0-main-sync-24-08-21 2024-08-21 09:09:36 +02:00