From 1b818b03d5e3c2fb112bc87eb5ca126dc4295ca4 Mon Sep 17 00:00:00 2001 From: Julius Volz Date: Tue, 13 May 2025 15:37:57 +0200 Subject: [PATCH] Clean up codeboxes and headings in docs The new docs site will have syntax highlighting, so this adds language tags to code boxes that are currently missing them. I didn't add `promql` as a language yet since the highlighter doesn't support it yet, plus a lot of the PromQL codeboxes in our docs aren't strictly valid PromQL, they are more like multiple expressions listed in the same code box on multiple lines. So I'm leaving that for sometime later. In the HTTP API page, I moved the curl examples from the JSON codeboxes to their own ones above the JSON output. I considered putting an "Output:" text between the curl + JSON output, but I think the way it currently looks without it is probably fine. I also fixed a number of headings which were at the wrong level relative to their nesting in the document. I also removed `go` as a language from the Go template language examples, because the Go template language isn't Go at all. I also adjusted the indent on one codebox to be more reasonable (2 spaces instead of 8). And then finally, my editor made a bunch of whitespace changes automatically, like removing trailing spaces. Signed-off-by: Julius Volz Signed-off-by: Julius Volz --- docs/configuration/alerting_rules.md | 10 +- docs/configuration/configuration.md | 42 +++--- docs/configuration/https.md | 2 +- docs/configuration/recording_rules.md | 17 +-- docs/configuration/template_examples.md | 12 +- docs/configuration/template_reference.md | 4 +- docs/feature_flags.md | 10 +- docs/getting_started.md | 2 +- docs/migration.md | 80 +++++------ docs/querying/api.md | 170 +++++++++++++++++------ docs/storage.md | 24 ++-- 11 files changed, 228 insertions(+), 145 deletions(-) diff --git a/docs/configuration/alerting_rules.md b/docs/configuration/alerting_rules.md index 0a442876c3..daddf71773 100644 --- a/docs/configuration/alerting_rules.md +++ b/docs/configuration/alerting_rules.md @@ -11,7 +11,7 @@ to an external service. Whenever the alert expression results in one or more vector elements at a given point in time, the alert counts as active for these elements' label sets. -### Defining alerting rules +## Defining alerting rules Alerting rules are configured in Prometheus in the same way as [recording rules](recording_rules.md). @@ -54,7 +54,7 @@ values can be templated. The `annotations` clause specifies a set of informational labels that can be used to store longer additional information such as alert descriptions or runbook links. The annotation values can be templated. -#### Templating +### Templating Label and annotation values can be templated using [console templates](https://prometheus.io/docs/visualization/consoles). The `$labels` @@ -93,7 +93,7 @@ groups: description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)" ``` -### Inspecting alerts during runtime +## Inspecting alerts during runtime To manually inspect which alerts are active (pending or firing), navigate to the "Alerts" tab of your Prometheus instance. This will show you the exact @@ -105,7 +105,7 @@ The sample value is set to `1` as long as the alert is in the indicated active (pending or firing) state, and the series is marked stale when this is no longer the case. -### Sending alert notifications +## Sending alert notifications Prometheus's alerting rules are good at figuring what is broken *right now*, but they are not a fully-fledged notification solution. Another layer is needed to @@ -114,6 +114,6 @@ on top of the simple alert definitions. In Prometheus's ecosystem, the [Alertmanager](https://prometheus.io/docs/alerting/alertmanager/) takes on this role. Thus, Prometheus may be configured to periodically send information about alert states to an Alertmanager instance, which then takes care of dispatching -the right notifications. +the right notifications. Prometheus can be [configured](configuration.md) to automatically discover available Alertmanager instances through its service discovery integrations. diff --git a/docs/configuration/configuration.md b/docs/configuration/configuration.md index 058e5e750c..1c095c075e 100644 --- a/docs/configuration/configuration.md +++ b/docs/configuration/configuration.md @@ -80,9 +80,9 @@ global: [ rule_query_offset: | default = 0s ] # The labels to add to any time series or alerts when communicating with - # external systems (federation, remote storage, Alertmanager). - # Environment variable references `${var}` or `$var` are replaced according - # to the values of the current environment variables. + # external systems (federation, remote storage, Alertmanager). + # Environment variable references `${var}` or `$var` are replaced according + # to the values of the current environment variables. # References to undefined variables are replaced by the empty string. # The `$` character can be escaped by using `$$`. external_labels: @@ -195,7 +195,7 @@ otlp: # It preserves all special character like dots and won't append special suffixes for metric # unit and type. # - # WARNING: The "NoTranslation" setting has significant known risks and limitations (see https://prometheus.io/docs/practices/naming/ + # WARNING: The "NoTranslation" setting has significant known risks and limitations (see https://prometheus.io/docs/practices/naming/ # for details): # * Impaired UX when using PromQL in plain YAML (e.g. alerts, rules, dashboard, autoscaling configuration). # * Series collisions which in the best case may result in OOO errors, in the worst case a silently malformed @@ -484,21 +484,21 @@ metric_relabel_configs: # that will be kept in memory. 0 means no limit. [ keep_dropped_targets: | default = 0 ] -# Specifies the validation scheme for metric and label names. Either blank or +# Specifies the validation scheme for metric and label names. Either blank or # "utf8" for full UTF-8 support, or "legacy" for letters, numbers, colons, and # underscores. [ metric_name_validation_scheme: | default "utf8" ] # Specifies the character escaping scheme that will be requested when scraping # for metric and label names that do not conform to the legacy Prometheus -# character set. Available options are: +# character set. Available options are: # * `allow-utf-8`: Full UTF-8 support, no escaping needed. # * `underscores`: Escape all legacy-invalid characters to underscores. # * `dots`: Escapes dots to `_dot_`, underscores to `__`, and all other # legacy-invalid characters to underscores. # * `values`: Prepend the name with `U__` and replace all invalid # characters with their unicode value, surrounded by underscores. Single -# underscores are replaced with double underscores. +# underscores are replaced with double underscores. # e.g. "U__my_2e_dotted_2e_name". # If this value is left blank, Prometheus will default to `allow-utf-8` if the # validation scheme for the current scrape config is set to utf8, or @@ -517,7 +517,7 @@ metric_relabel_configs: # reduced as much as possible until it is within the limit. # To set an upper limit for the schema (equivalent to "scale" in OTel's # exponential histograms), use the following factor limits: -# +# # +----------------------------+----------------------------+ # | growth factor | resulting schema AKA scale | # +----------------------------+----------------------------+ @@ -547,7 +547,7 @@ metric_relabel_configs: # +----------------------------+----------------------------+ # | 1.002 | 8 | # +----------------------------+----------------------------+ -# +# # 0 results in the smallest supported factor (which is currently ~1.0027 or # schema 8, but might change in the future). [ native_histogram_min_bucket_factor: | default = 0 ] @@ -564,7 +564,7 @@ Where `` must be unique across all scrape configurations. A `http_config` allows configuring HTTP requests. -``` +```yaml # Sets the `Authorization` header on every request with the # configured username and password. # username and username_file are mutually exclusive. @@ -795,7 +795,7 @@ The following meta labels are available on targets during [relabeling](#relabel_ * `__meta_consul_address`: the address of the target * `__meta_consul_dc`: the datacenter name for the target * `__meta_consul_health`: the health status of the service -* `__meta_consul_partition`: the admin partition name where the service is registered +* `__meta_consul_partition`: the admin partition name where the service is registered * `__meta_consul_metadata_`: each node metadata key value of the target * `__meta_consul_node`: the node name defined for the target * `__meta_consul_service_address`: the service address of the target @@ -942,7 +942,7 @@ host: [ host_networking_host: | default = "localhost" ] # Sort all non-nil networks in ascending order based on network name and -# get the first network if the container has multiple networks defined, +# get the first network if the container has multiple networks defined, # thus avoiding collecting duplicate targets. [ match_first_network: | default = true ] @@ -1258,7 +1258,7 @@ The following meta labels are available on targets during [relabeling](#relabel_ #### `loadbalancer` -The `loadbalancer` role discovers one target per Octavia loadbalancer with a +The `loadbalancer` role discovers one target per Octavia loadbalancer with a `PROMETHEUS` listener. The target address defaults to the VIP address of the load balancer. @@ -1471,7 +1471,7 @@ and serves as an interface to plug in custom service discovery mechanisms. It reads a set of files containing a list of zero or more ``s. Changes to all defined files are detected via disk watches -and applied immediately. +and applied immediately. While those individual files are watched for changes, the parent directory is also watched implicitly. This is to handle [atomic @@ -1984,7 +1984,7 @@ See below for the configuration options for Kuma MonitoringAssignment discovery: # Address of the Kuma Control Plane's MADS xDS server. server: -# Client id is used by Kuma Control Plane to compute Monitoring Assignment for specific Prometheus backend. +# Client id is used by Kuma Control Plane to compute Monitoring Assignment for specific Prometheus backend. # This is useful when migrating between multiple Prometheus backends, or having separate backend for each Mesh. # When not specified, system hostname/fqdn will be used if available, if not `prometheus` will be used. [ client_id: ] @@ -2082,7 +2082,7 @@ The following meta labels are available on targets during [relabeling](#relabel_ * `__meta_linode_status`: the status of the linode instance * `__meta_linode_tags`: a list of tags of the linode instance joined by the tag separator * `__meta_linode_group`: the display group a linode instance is a member of -* `__meta_linode_gpus`: the number of GPU's of the linode instance +* `__meta_linode_gpus`: the number of GPU's of the linode instance * `__meta_linode_hypervisor`: the virtualization software powering the linode instance * `__meta_linode_backups`: the backup service status of the linode instance * `__meta_linode_specs_disk_bytes`: the amount of storage space the linode instance has access to @@ -2603,7 +2603,7 @@ input to a subsequent relabeling step), use the `__tmp` label name prefix. This prefix is guaranteed to never be used by Prometheus itself. ```yaml -# The source_labels tells the rule what labels to fetch from the series. Any +# The source_labels tells the rule what labels to fetch from the series. Any # labels which do not exist get a blank value (""). Their content is concatenated # using the configured separator and matched against the configured regular expression # for the replace, keep, and drop actions. @@ -2894,7 +2894,7 @@ write_relabel_configs: # For the `io.prometheus.write.v2.Request` message, this option is noop (always true). [ send_native_histograms: | default = false ] -# When enabled, remote-write will resolve the URL host name via DNS, choose one of the IP addresses at random, and connect to it. +# When enabled, remote-write will resolve the URL host name via DNS, choose one of the IP addresses at random, and connect to it. # When disabled, remote-write relies on Go's standard behavior, which is to try to connect to each address in turn. # The connection timeout applies to the whole operation, i.e. in the latter case it is spread over all attempt. # This is an experimental feature, and its behavior might still change, or even get removed. @@ -2927,7 +2927,7 @@ azuread: # Azure User-assigned Managed identity. [ managed_identity: - [ client_id: ] ] + [ client_id: ] ] # Azure OAuth. [ oauth: @@ -3055,8 +3055,8 @@ with this feature. # that is within the out-of-order window, or (b) too-old, i.e. not in-order # and before the out-of-order window. # -# When out_of_order_time_window is greater than 0, it also affects experimental agent. It allows -# the agent's WAL to accept out-of-order samples that fall within the specified time window relative +# When out_of_order_time_window is greater than 0, it also affects experimental agent. It allows +# the agent's WAL to accept out-of-order samples that fall within the specified time window relative # to the timestamp of the last appended sample for the same series. [ out_of_order_time_window: | default = 0s ] ``` diff --git a/docs/configuration/https.md b/docs/configuration/https.md index bc83e07a38..ba6ed0f814 100644 --- a/docs/configuration/https.md +++ b/docs/configuration/https.md @@ -27,7 +27,7 @@ Generic placeholders are defined as follows: A valid example file can be found [here](/documentation/examples/web-config.yml). -``` +```yaml tls_server_config: # Certificate and key files for server to use to authenticate to client. cert_file: diff --git a/docs/configuration/recording_rules.md b/docs/configuration/recording_rules.md index 45a263292b..ebcb893ff5 100644 --- a/docs/configuration/recording_rules.md +++ b/docs/configuration/recording_rules.md @@ -34,7 +34,7 @@ When the file is syntactically valid, the checker prints a textual representation of the parsed rules to standard output and then exits with a `0` return status. -If there are any syntax errors or invalid input arguments, it prints an error +If there are any syntax errors or invalid input arguments, it prints an error message to standard error and exits with a `1` return status. ## Recording rules @@ -71,7 +71,8 @@ groups: ``` ### `` -``` + +```yaml # The name of the group. Must be unique within a file. name: @@ -98,7 +99,7 @@ rules: The syntax for recording rules is: -``` +```yaml # The name of the time series to output to. Must be a valid metric name. record: @@ -114,7 +115,7 @@ labels: The syntax for alerting rules is: -``` +```yaml # The name of the alert. Must be a valid label value. alert: @@ -143,7 +144,7 @@ annotations: See also the [best practices for naming metrics created by recording rules](https://prometheus.io/docs/practices/rules/#recording-rules). -# Limiting alerts and series +## Limiting alerts and series A limit for alerts produced by alerting rules and series produced recording rules can be configured per-group. When the limit is exceeded, _all_ series produced @@ -152,9 +153,9 @@ the rule, active, pending, or inactive, are cleared as well. The event will be recorded as an error in the evaluation, and as such no stale markers are written. -# Rule query offset +## Rule query offset This is useful to ensure the underlying metrics have been received and stored in Prometheus. Metric availability delays are more likely to occur when Prometheus is running as a remote write target due to the nature of distributed systems, but can also occur when there's anomalies with scraping and/or short evaluation intervals. -# Failed rule evaluations due to slow evaluation +## Failed rule evaluations due to slow evaluation -If a rule group hasn't finished evaluating before its next evaluation is supposed to start (as defined by the `evaluation_interval`), the next evaluation will be skipped. Subsequent evaluations of the rule group will continue to be skipped until the initial evaluation either completes or times out. When this happens, there will be a gap in the metric produced by the recording rule. The `rule_group_iterations_missed_total` metric will be incremented for each missed iteration of the rule group. +If a rule group hasn't finished evaluating before its next evaluation is supposed to start (as defined by the `evaluation_interval`), the next evaluation will be skipped. Subsequent evaluations of the rule group will continue to be skipped until the initial evaluation either completes or times out. When this happens, there will be a gap in the metric produced by the recording rule. The `rule_group_iterations_missed_total` metric will be incremented for each missed iteration of the rule group. diff --git a/docs/configuration/template_examples.md b/docs/configuration/template_examples.md index 672295343f..2d12fb7129 100644 --- a/docs/configuration/template_examples.md +++ b/docs/configuration/template_examples.md @@ -13,7 +13,7 @@ templating](https://golang.org/pkg/text/template/) system. ## Simple alert field templates -``` +```yaml alert: InstanceDown expr: up == 0 for: 5m @@ -33,7 +33,7 @@ console instead. This displays a list of instances, and whether they are up: -```go +``` {{ range query "up" }} {{ .Labels.instance }} {{ .Value }} {{ end }} @@ -43,7 +43,7 @@ The special `.` variable contains the value of the current sample for each loop ## Display one value -```go +``` {{ with query "some_metric{instance='someinstance'}" }} {{ . | first | value | humanize }} {{ end }} @@ -58,7 +58,7 @@ formatting of results, and linking to the [expression browser](https://prometheu ## Using console URL parameters -```go +``` {{ with printf "node_memory_MemTotal{job='node',instance='%s'}" .Params.instance | query }} {{ . | first | value | humanize1024 }}B {{ end }} @@ -95,7 +95,7 @@ powerful when combined with [console library](template_reference.md#console-templates) support, allowing sharing of templates across consoles. -```go +``` {{/* Define the template */}} {{define "myTemplate"}} do something @@ -107,7 +107,7 @@ sharing of templates across consoles. Templates are limited to one argument. The `args` function can be used to wrap multiple arguments. -```go +``` {{define "myMultiArgTemplate"}} First argument: {{.arg0}} Second argument: {{.arg1}} diff --git a/docs/configuration/template_reference.md b/docs/configuration/template_reference.md index ec4b31376c..91920ae6a8 100644 --- a/docs/configuration/template_reference.md +++ b/docs/configuration/template_reference.md @@ -17,8 +17,8 @@ The primary data structure for dealing with time series data is the sample, defi ```go type sample struct { - Labels map[string]string - Value interface{} + Labels map[string]string + Value interface{} } ``` diff --git a/docs/feature_flags.md b/docs/feature_flags.md index 174184072e..c4fb04e2d1 100644 --- a/docs/feature_flags.md +++ b/docs/feature_flags.md @@ -23,7 +23,7 @@ Exemplar storage is implemented as a fixed size circular buffer that stores exem `--enable-feature=memory-snapshot-on-shutdown` -This takes a snapshot of the chunks that are in memory along with the series information when shutting down and stores it on disk. This will reduce the startup time since the memory state can now be restored with this snapshot +This takes a snapshot of the chunks that are in memory along with the series information when shutting down and stores it on disk. This will reduce the startup time since the memory state can now be restored with this snapshot and m-mapped chunks, while a WAL replay from disk is only needed for the parts of the WAL that are not part of the snapshot. ## Extra scrape metrics @@ -183,7 +183,7 @@ This state is periodically ([`max_stale`][d2c]) cleared of inactive series. Enabling this _can_ have negative impact on performance, because the in-memory state is mutex guarded. Cumulative-only OTLP requests are not affected. -### PromQL arithmetic expressions in time durations +## PromQL arithmetic expressions in time durations `--enable-feature=promql-duration-expr` @@ -203,7 +203,7 @@ The following operators are supported: * `+` - addition * `-` - subtraction -* `*` - multiplication +* `*` - multiplication * `/` - division * `%` - modulo * `^` - exponentiation @@ -227,7 +227,7 @@ When enabled, allows for the native ingestion of delta OTLP metrics, storing the Currently, the StartTimeUnixNano field is ignored, and deltas are given the unknown metric metadata type. -Delta support is in a very early stage of development and the ingestion and querying process my change over time. For the open proposal see [prometheus/proposals#48](https://github.com/prometheus/proposals/pull/48). +Delta support is in a very early stage of development and the ingestion and querying process my change over time. For the open proposal see [prometheus/proposals#48](https://github.com/prometheus/proposals/pull/48). ### Querying @@ -246,4 +246,4 @@ These may not work well if the `` is not a multiple of the collection int * It is difficult to figure out whether a metric has delta or cumulative temporality, since there's no indication of temporality in metric names or labels. For now, if you are ingesting a mix of delta and cumulative metrics we advise you to explicitly add your own labels to distinguish them. In the future, we plan to introduce type labels to consistently distinguish metric types and potentially make PromQL functions type-aware (e.g. providing warnings when cumulative-only functions are used with delta metrics). -* If there are multiple samples being ingested at the same timestamp, only one of the points is kept - the samples are **not** summed together (this is how Prometheus works in general - duplicate timestamp samples are rejected). Any aggregation will have to be done before sending samples to Prometheus. \ No newline at end of file +* If there are multiple samples being ingested at the same timestamp, only one of the points is kept - the samples are **not** summed together (this is how Prometheus works in general - duplicate timestamp samples are rejected). Any aggregation will have to be done before sending samples to Prometheus. diff --git a/docs/getting_started.md b/docs/getting_started.md index 82bae9b8d4..012c4fd240 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -200,7 +200,7 @@ To record the time series resulting from this expression into a new metric called `job_instance_mode:node_cpu_seconds:avg_rate5m`, create a file with the following recording rule and save it as `prometheus.rules.yml`: -``` +```yaml groups: - name: cpu-node rules: diff --git a/docs/migration.md b/docs/migration.md index d9a1148951..e06741bcec 100644 --- a/docs/migration.md +++ b/docs/migration.md @@ -11,52 +11,52 @@ This document offers guidance on migrating from Prometheus 2.x to Prometheus 3.0 ## Flags -- The following feature flags have been removed and they have been added to the +- The following feature flags have been removed and they have been added to the default behavior of Prometheus v3: - `promql-at-modifier` - `promql-negative-offset` - `new-service-discovery-manager` - `expand-external-labels` - - Environment variable references `${var}` or `$var` in external label values - are replaced according to the values of the current environment variables. + - Environment variable references `${var}` or `$var` in external label values + are replaced according to the values of the current environment variables. - References to undefined variables are replaced by the empty string. The `$` character can be escaped by using `$$`. - `no-default-scrape-port` - - Prometheus v3 will no longer add ports to scrape targets according to the + - Prometheus v3 will no longer add ports to scrape targets according to the specified scheme. Target will now appear in labels as configured. - - If you rely on scrape targets like - `https://example.com/metrics` or `http://example.com/metrics` to be - represented as `https://example.com/metrics:443` and + - If you rely on scrape targets like + `https://example.com/metrics` or `http://example.com/metrics` to be + represented as `https://example.com/metrics:443` and `http://example.com/metrics:80` respectively, add them to your target URLs - `agent` - Instead use the dedicated `--agent` CLI flag. - `remote-write-receiver` - Instead use the dedicated `--web.enable-remote-write-receiver` CLI flag to enable the remote write receiver. - `auto-gomemlimit` - - Prometheus v3 will automatically set `GOMEMLIMIT` to match the Linux - container memory limit. If there is no container limit, or the process is - running outside of containers, the system memory total is used. To disable + - Prometheus v3 will automatically set `GOMEMLIMIT` to match the Linux + container memory limit. If there is no container limit, or the process is + running outside of containers, the system memory total is used. To disable this, `--no-auto-gomemlimit` is available. - `auto-gomaxprocs` - - Prometheus v3 will automatically set `GOMAXPROCS` to match the Linux + - Prometheus v3 will automatically set `GOMAXPROCS` to match the Linux container CPU quota. To disable this, `--no-auto-gomaxprocs` is available. - Prometheus v3 will log a warning if you continue to pass these to + Prometheus v3 will log a warning if you continue to pass these to `--enable-feature`. ## Configuration -- The scrape job level configuration option `scrape_classic_histograms` has been - renamed to `always_scrape_classic_histograms`. If you use the - `--enable-feature=native-histograms` feature flag to ingest native histograms - and you also want to ingest classic histograms that an endpoint might expose - along with native histograms, be sure to add this configuration or change your +- The scrape job level configuration option `scrape_classic_histograms` has been + renamed to `always_scrape_classic_histograms`. If you use the + `--enable-feature=native-histograms` feature flag to ingest native histograms + and you also want to ingest classic histograms that an endpoint might expose + along with native histograms, be sure to add this configuration or change your configuration from the old name. -- The `http_config.enable_http2` in `remote_write` items default has been - changed to `false`. In Prometheus v2 the remote write http client would - default to use http2. In order to parallelize multiple remote write queues +- The `http_config.enable_http2` in `remote_write` items default has been + changed to `false`. In Prometheus v2 the remote write http client would + default to use http2. In order to parallelize multiple remote write queues across multiple sockets its preferable to not default to http2. - If you prefer to use http2 for remote write you must now set + If you prefer to use http2 for remote write you must now set `http_config.enable_http2: true` in your `remote_write` configuration section. ## PromQL @@ -137,7 +137,7 @@ may now fail if this fallback protocol is not specified. ### TSDB format and downgrade -The TSDB format has been changed slightly in Prometheus v2.55 in preparation for changes +The TSDB format has been changed slightly in Prometheus v2.55 in preparation for changes to the index format. Consequently, a Prometheus v3 TSDB can only be read by a Prometheus v2.55 or newer. Keep that in mind when upgrading to v3 -- you will be only able to downgrade to v2.55, not lower, without losing your TSDB persistent data. @@ -147,8 +147,8 @@ confirm Prometheus works as expected, before upgrading to v3. ### TSDB storage contract -TSDB compatible storage is now expected to return results matching the specified -selectors. This might impact some third party implementations, most likely +TSDB compatible storage is now expected to return results matching the specified +selectors. This might impact some third party implementations, most likely implementing `remote_read`. This contract is not explicitly enforced, but can cause undefined behavior. @@ -179,7 +179,7 @@ scrape_configs: ``` ### Log message format -Prometheus v3 has adopted `log/slog` over the previous `go-kit/log`. This +Prometheus v3 has adopted `log/slog` over the previous `go-kit/log`. This results in a change of log message format. An example of the old log format is: ``` @@ -198,19 +198,19 @@ time=2024-10-24T00:03:07.542+02:00 level=INFO source=/home/user/go/src/github.co ``` ### `le` and `quantile` label values -In Prometheus v3, the values of the `le` label of classic histograms and the +In Prometheus v3, the values of the `le` label of classic histograms and the `quantile` label of summaries are normalized upon ingestion. In Prometheus v2 -the value of these labels depended on the scrape protocol (protobuf vs text -format) in some situations. This led to label values changing based on the -scrape protocol. E.g. a metric exposed as `my_classic_hist{le="1"}` would be -ingested as `my_classic_hist{le="1"}` via the text format, but as -`my_classic_hist{le="1.0"}` via protobuf. This changed the identity of the +the value of these labels depended on the scrape protocol (protobuf vs text +format) in some situations. This led to label values changing based on the +scrape protocol. E.g. a metric exposed as `my_classic_hist{le="1"}` would be +ingested as `my_classic_hist{le="1"}` via the text format, but as +`my_classic_hist{le="1.0"}` via protobuf. This changed the identity of the metric and caused problems when querying the metric. -In Prometheus v3 these label values will always be normalized to a float like -representation. I.e. the above example will always result in -`my_classic_hist{le="1.0"}` being ingested into prometheus, no matter via which -protocol. The effect of this change is that alerts, recording rules and -dashboards that directly reference label values as whole numbers such as +In Prometheus v3 these label values will always be normalized to a float like +representation. I.e. the above example will always result in +`my_classic_hist{le="1.0"}` being ingested into prometheus, no matter via which +protocol. The effect of this change is that alerts, recording rules and +dashboards that directly reference label values as whole numbers such as `le="1"` will stop working. Ways to deal with this change either globally or on a per metric basis: @@ -236,11 +236,11 @@ This should **only** be applied to metrics that currently produce such labels. ``` ### Disallow configuring Alertmanager with the v1 API -Prometheus 3 no longer supports Alertmanager's v1 API. Effectively Prometheus 3 +Prometheus 3 no longer supports Alertmanager's v1 API. Effectively Prometheus 3 requires [Alertmanager 0.16.0](https://github.com/prometheus/alertmanager/releases/tag/v0.16.0) or later. Users with older Alertmanager -versions or configurations that use `alerting: alertmanagers: [api_version: v1]` +versions or configurations that use `alerting: alertmanagers: [api_version: v1]` need to upgrade Alertmanager and change their configuration to use `api_version: v2`. -# Prometheus 2.0 migration guide +## Prometheus 2.0 migration guide -For the Prometheus 1.8 to 2.0 please refer to the [Prometheus v2.55 documentation](https://prometheus.io/docs/prometheus/2.55/migration/). +For the migration guide from Prometheus 1.8 to 2.0 please refer to the [Prometheus v2.55 documentation](https://prometheus.io/docs/prometheus/2.55/migration/). diff --git a/docs/querying/api.md b/docs/querying/api.md index 033a2dfcf5..0bfaa63b0f 100644 --- a/docs/querying/api.md +++ b/docs/querying/api.md @@ -32,7 +32,7 @@ will be returned in the data field. The JSON response envelope format is as follows: -``` +```json { "status": "success" | "error", "data": , @@ -96,7 +96,7 @@ query that may breach server-side URL character limits. The `data` section of the query result has the following format: -``` +```json { "resultType": "matrix" | "vector" | "scalar" | "string", "result": @@ -110,8 +110,11 @@ formats](#expression-query-result-formats). The following example evaluates the expression `up` at the time `2015-07-01T20:10:51.781Z`: +```bash +curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z' +``` + ```json -$ curl 'http://localhost:9090/api/v1/query?query=up&time=2015-07-01T20:10:51.781Z' { "status" : "success", "data" : { @@ -163,7 +166,7 @@ query that may breach server-side URL character limits. The `data` section of the query result has the following format: -``` +```json { "resultType": "matrix", "result": @@ -176,8 +179,11 @@ format](#range-vectors). The following example evaluates the expression `up` over a 30-second range with a query resolution of 15 seconds. +```bash +curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s' +``` + ```json -$ curl 'http://localhost:9090/api/v1/query_range?query=up&start=2015-07-01T20:10:30.781Z&end=2015-07-01T20:11:00.781Z&step=15s' { "status" : "success", "data" : { @@ -233,8 +239,11 @@ The `data` section of the query result is a string containing the formatted quer The following example formats the expression `foo/bar`: +```bash +curl 'http://localhost:9090/api/v1/format_query?query=foo/bar' +``` + ```json -$ curl 'http://localhost:9090/api/v1/format_query?query=foo/bar' { "status" : "success", "data" : "foo / bar" @@ -264,8 +273,11 @@ The `data` section of the query result is a string containing the AST of the par The following example parses the expression `foo/bar`: +```bash +curl 'http://localhost:9090/api/v1/parse_query?query=foo/bar' +``` + ```json -$ curl 'http://localhost:9090/api/v1/parse_query?query=foo/bar' { "data" : { "bool" : false, @@ -343,8 +355,11 @@ contain the label name/value pairs which identify each series. The following example returns all series that match either of the selectors `up` or `process_start_time_seconds{job="prometheus"}`: +```bash +curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=up' --data-urlencode 'match[]=process_start_time_seconds{job="prometheus"}' +``` + ```json -$ curl -g 'http://localhost:9090/api/v1/series?' --data-urlencode 'match[]=up' --data-urlencode 'match[]=process_start_time_seconds{job="prometheus"}' { "status" : "success", "data" : [ @@ -389,8 +404,11 @@ The `data` section of the JSON response is a list of string label names. Here is an example. +```bash +curl 'localhost:9090/api/v1/labels' +``` + ```json -$ curl 'localhost:9090/api/v1/labels' { "status": "success", "data": [ @@ -439,8 +457,11 @@ The `data` section of the JSON response is a list of string label values. This example queries for all label values for the `http_status_code` label: +```bash +curl http://localhost:9090/api/v1/label/http_status_code/values +``` + ```json -$ curl http://localhost:9090/api/v1/label/http_status_code/values { "status" : "success", "data" : [ @@ -462,8 +483,11 @@ Label names can optionally be encoded using the Values Escaping method, and is n This example queries for all label values for the `http.status_code` label: +```bash +curl http://localhost:9090/api/v1/label/U__http_2e_status_code/values +``` + ```json -$ curl http://localhost:9090/api/v1/label/U__http_2e_status_code/values { "status" : "success", "data" : [ @@ -489,8 +513,11 @@ URL query parameters: - `start=`: Start timestamp. - `end=`: End timestamp. +```bash +curl -g 'http://localhost:9090/api/v1/query_exemplars?query=test_exemplar_metric_total&start=2020-09-14T15:22:25.479Z&end=2020-09-14T15:23:25.479Z' +``` + ```json -$ curl -g 'http://localhost:9090/api/v1/query_exemplars?query=test_exemplar_metric_total&start=2020-09-14T15:22:25.479Z&end=2020-09-14T15:23:25.479Z' { "status": "success", "data": [ @@ -556,7 +583,7 @@ is explained in detail in its own section below. Range vectors are returned as result type `matrix`. The corresponding `result` property has the following format: -``` +```json [ { "metric": { "": "", ... }, @@ -578,7 +605,7 @@ and [`sort_by_label`](functions.md#sort_by_label) have no effect for range vecto Instant vectors are returned as result type `vector`. The corresponding `result` property has the following format: -``` +```json [ { "metric": { "": "", ... }, @@ -600,7 +627,7 @@ is used. Scalar results are returned as result type `scalar`. The corresponding `result` property has the following format: -``` +```json [ , "" ] ``` @@ -609,7 +636,7 @@ Scalar results are returned as result type `scalar`. The corresponding String results are returned as result type `string`. The corresponding `result` property has the following format: -``` +```json [ , "" ] ``` @@ -620,7 +647,7 @@ The `` placeholder used above is formatted as follows. _Note that native histograms are an experimental feature, and the format below might still change._ -``` +```json { "count": "", "sum": "", @@ -654,8 +681,11 @@ Dropped targets are subject to `keep_dropped_targets` limit, if set. `labels` represents the label set after relabeling has occurred. `discoveredLabels` represent the unmodified labels retrieved during service discovery before relabeling has occurred. +```bash +curl http://localhost:9090/api/v1/targets +``` + ```json -$ curl http://localhost:9090/api/v1/targets { "status": "success", "data": { @@ -704,9 +734,12 @@ The `state` query parameter allows the caller to filter by active or dropped tar Note that an empty array is still returned for targets that are filtered out. Other values are ignored. +```bash +curl 'http://localhost:9090/api/v1/targets?state=active' +``` + ```json -$ curl 'http://localhost:9090/api/v1/targets?state=active' -{ + "status": "success", "data": { "activeTargets": [ @@ -737,9 +770,12 @@ $ curl 'http://localhost:9090/api/v1/targets?state=active' The `scrapePool` query parameter allows the caller to filter by scrape pool name. +```bash +curl 'http://localhost:9090/api/v1/targets?scrapePool=node_exporter' +``` + ```json -$ curl 'http://localhost:9090/api/v1/targets?scrapePool=node_exporter' -{ + "status": "success", "data": { "activeTargets": [ @@ -792,9 +828,11 @@ URL query parameters: - `group_limit=`: The `group_limit` parameter allows you to specify a limit for the number of rule groups that is returned in a single response. If the total number of rule groups exceeds the specified `group_limit` value, the response will include a `groupNextToken` property. You can use the value of this `groupNextToken` property in subsequent requests in the `group_next_token` parameter to paginate over the remaining rule groups. The `groupNextToken` property will not be present in the final response, indicating that you have retrieved all the available rule groups. Please note that there are no guarantees regarding the consistency of the response if the rule groups are being modified during the pagination process. - `group_next_token`: the pagination token that was returned in previous request when the `group_limit` property is set. The pagination token is used to iteratively paginate over a large number of rule groups. To use the `group_next_token` parameter, the `group_limit` parameter also need to be present. If a rule group that coincides with the next token is removed while you are paginating over the rule groups, a response with status code 400 will be returned. -```json -$ curl http://localhost:9090/api/v1/rules +```bash +curl http://localhost:9090/api/v1/rules +``` +```json { "data": { "groups": [ @@ -857,9 +895,11 @@ guarantees as the overarching API v1. GET /api/v1/alerts ``` -```json -$ curl http://localhost:9090/api/v1/alerts +```bash +curl http://localhost:9090/api/v1/alerts +``` +```json { "data": { "alerts": [ @@ -904,6 +944,9 @@ curl -G http://localhost:9091/api/v1/targets/metadata \ --data-urlencode 'metric=go_goroutines' \ --data-urlencode 'match_target={job="prometheus"}' \ --data-urlencode 'limit=2' +``` + +```json { "status": "success", "data": [ @@ -932,9 +975,12 @@ curl -G http://localhost:9091/api/v1/targets/metadata \ The following example returns metadata for all metrics for all targets with label `instance="127.0.0.1:9090"`. -```json +```bash curl -G http://localhost:9091/api/v1/targets/metadata \ --data-urlencode 'match_target={instance="127.0.0.1:9090"}' +``` + +```json { "status": "success", "data": [ @@ -983,9 +1029,11 @@ The `data` section of the query result consists of an object where each key is a The following example returns two metrics. Note that the metric `http_requests_total` has more than one object in the list. At least one target has a value for `HELP` that do not match with the rest. -```json +```bash curl -G http://localhost:9090/api/v1/metadata?limit=2 +``` +```json { "status": "success", "data": { @@ -1014,9 +1062,11 @@ curl -G http://localhost:9090/api/v1/metadata?limit=2 The following example returns only one metadata entry for each metric. -```json +```bash curl -G http://localhost:9090/api/v1/metadata?limit_per_metric=1 +``` +```json { "status": "success", "data": { @@ -1040,9 +1090,11 @@ curl -G http://localhost:9090/api/v1/metadata?limit_per_metric=1 The following example returns metadata only for the metric `http_requests_total`. -```json +```bash curl -G http://localhost:9090/api/v1/metadata?metric=http_requests_total +``` +```json { "status": "success", "data": { @@ -1073,8 +1125,11 @@ GET /api/v1/alertmanagers Both the active and dropped Alertmanagers are part of the response. +```bash +curl http://localhost:9090/api/v1/alertmanagers +``` + ```json -$ curl http://localhost:9090/api/v1/alertmanagers { "status": "success", "data": { @@ -1107,8 +1162,11 @@ GET /api/v1/status/config The config is returned as dumped YAML file. Due to limitation of the YAML library, YAML comments are not included. +```bash +curl http://localhost:9090/api/v1/status/config +``` + ```json -$ curl http://localhost:9090/api/v1/status/config { "status": "success", "data": { @@ -1127,8 +1185,11 @@ GET /api/v1/status/flags All values are of the result type `string`. +```bash +curl http://localhost:9090/api/v1/status/flags +``` + ```json -$ curl http://localhost:9090/api/v1/status/flags { "status": "success", "data": { @@ -1154,8 +1215,11 @@ GET /api/v1/status/runtimeinfo The returned values are of different types, depending on the nature of the runtime property. +```bash +curl http://localhost:9090/api/v1/status/runtimeinfo +``` + ```json -$ curl http://localhost:9090/api/v1/status/runtimeinfo { "status": "success", "data": { @@ -1190,8 +1254,11 @@ GET /api/v1/status/buildinfo All values are of the result type `string`. +```bash +curl http://localhost:9090/api/v1/status/buildinfo +``` + ```json -$ curl http://localhost:9090/api/v1/status/buildinfo { "status": "success", "data": { @@ -1232,8 +1299,11 @@ The `data` section of the query result consists of: - **memoryInBytesByLabelName** This will provide a list of the label names and memory used in bytes. Memory usage is calculated by adding the length of all values for a given label name. - **seriesCountByLabelPair** This will provide a list of label value pairs and their series count. +```bash +curl http://localhost:9090/api/v1/status/tsdb +``` + ```json -$ curl http://localhost:9090/api/v1/status/tsdb { "status": "success", "data": { @@ -1305,8 +1375,11 @@ GET /api/v1/status/walreplay - **in progress**: The replay is in progress. - **done**: The replay has finished. +```bash +curl http://localhost:9090/api/v1/status/walreplay +``` + ```json -$ curl http://localhost:9090/api/v1/status/walreplay { "status": "success", "data": { @@ -1338,8 +1411,11 @@ URL query parameters: - `skip_head=`: Skip data present in the head block. Optional. +```bash +curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot +``` + ```json -$ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/snapshot { "status": "success", "data": { @@ -1371,8 +1447,8 @@ Not mentioning both start and end times would clear all the data for the matched Example: -```json -$ curl -X POST \ +```bash +curl -X POST \ -g 'http://localhost:9090/api/v1/admin/tsdb/delete_series?match[]=up&match[]=process_start_time_seconds{job="prometheus"}' ``` @@ -1392,8 +1468,8 @@ PUT /api/v1/admin/tsdb/clean_tombstones This takes no parameters or body. -```json -$ curl -XPOST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones +```bash +curl -XPOST http://localhost:9090/api/v1/admin/tsdb/clean_tombstones ``` *New in v2.1 and supports PUT from v2.9* @@ -1451,8 +1527,11 @@ GET /api/v1/notifications Example: +```bash +curl http://localhost:9090/api/v1/notifications +``` + ``` -$ curl http://localhost:9090/api/v1/notifications { "status": "success", "data": [ @@ -1477,8 +1556,11 @@ GET /api/v1/notifications/live Example: +```bash +curl http://localhost:9090/api/v1/notifications/live +``` + ``` -$ curl http://localhost:9090/api/v1/notifications/live data: { "status": "success", "data": [ diff --git a/docs/storage.md b/docs/storage.md index e625e9c225..e04ce027bf 100644 --- a/docs/storage.md +++ b/docs/storage.md @@ -61,10 +61,10 @@ A Prometheus server's data directory looks something like this: Note that a limitation of local storage is that it is not clustered or replicated. Thus, it is not arbitrarily scalable or durable in the face of drive or node outages and should be managed like any other single node -database. +database. -[Snapshots](querying/api.md#snapshot) are recommended for backups. Backups -made without snapshots run the risk of losing data that was recorded since +[Snapshots](querying/api.md#snapshot) are recommended for backups. Backups +made without snapshots run the risk of losing data that was recorded since the last WAL sync, which typically happens every two hours. With proper architecture, it is possible to retain years of data in local storage. @@ -75,14 +75,14 @@ performance, and efficiency. For further details on file format, see [TSDB format](/tsdb/docs/format/README.md). -## Compaction +### Compaction The initial two-hour blocks are eventually compacted into longer blocks in the background. Compaction will create larger blocks containing data spanning up to 10% of the retention time, or 31 days, whichever is smaller. -## Operational aspects +### Operational aspects Prometheus has several flags that configure local storage. The most important are: @@ -134,16 +134,16 @@ will be used. Expired block cleanup happens in the background. It may take up to two hours to remove expired blocks. Blocks must be fully expired before they are removed. -## Right-Sizing Retention Size +### Right-Sizing Retention Size -If you are utilizing `storage.tsdb.retention.size` to set a size limit, you -will want to consider the right size for this value relative to the storage you -have allocated for Prometheus. It is wise to reduce the retention size to provide -a buffer, ensuring that older entries will be removed before the allocated storage +If you are utilizing `storage.tsdb.retention.size` to set a size limit, you +will want to consider the right size for this value relative to the storage you +have allocated for Prometheus. It is wise to reduce the retention size to provide +a buffer, ensuring that older entries will be removed before the allocated storage for Prometheus becomes full. -At present, we recommend setting the retention size to, at most, 80-85% of your -allocated Prometheus disk space. This increases the likelihood that older entries +At present, we recommend setting the retention size to, at most, 80-85% of your +allocated Prometheus disk space. This increases the likelihood that older entries will be removed prior to hitting any disk limitations. ## Remote storage integrations