mirror of
https://github.com/prometheus/prometheus.git
synced 2025-08-06 22:27:17 +02:00
The new docs site will have syntax highlighting, so this adds language tags to code boxes that are currently missing them. I didn't add `promql` as a language yet since the highlighter doesn't support it yet, plus a lot of the PromQL codeboxes in our docs aren't strictly valid PromQL, they are more like multiple expressions listed in the same code box on multiple lines. So I'm leaving that for sometime later. In the HTTP API page, I moved the curl examples from the JSON codeboxes to their own ones above the JSON output. I considered putting an "Output:" text between the curl + JSON output, but I think the way it currently looks without it is probably fine. I also fixed a number of headings which were at the wrong level relative to their nesting in the document. I also removed `go` as a language from the Go template language examples, because the Go template language isn't Go at all. I also adjusted the indent on one codebox to be more reasonable (2 spaces instead of 8). And then finally, my editor made a bunch of whitespace changes automatically, like removing trailing spaces. Signed-off-by: Julius Volz <julius.volz@gmail.com> Signed-off-by: Julius Volz <julius.volz@gmail.com>
271 lines
9.2 KiB
Markdown
271 lines
9.2 KiB
Markdown
---
|
|
title: Getting started
|
|
sort_rank: 1
|
|
---
|
|
|
|
# Getting started
|
|
|
|
This guide is a "Hello World"-style tutorial which shows how to install,
|
|
configure, and use a simple Prometheus instance. You will download and run
|
|
Prometheus locally, configure it to scrape itself and an example application,
|
|
then work with queries, rules, and graphs to use collected time
|
|
series data.
|
|
|
|
## Downloading and running Prometheus
|
|
|
|
[Download the latest release](https://prometheus.io/download) of Prometheus for
|
|
your platform, then extract and run it:
|
|
|
|
```bash
|
|
tar xvfz prometheus-*.tar.gz
|
|
cd prometheus-*
|
|
```
|
|
|
|
Before starting Prometheus, let's configure it.
|
|
|
|
## Configuring Prometheus to monitor itself
|
|
|
|
Prometheus collects metrics from _targets_ by scraping metrics HTTP
|
|
endpoints. Since Prometheus exposes data in the same
|
|
manner about itself, it can also scrape and monitor its own health.
|
|
|
|
While a Prometheus server that collects only data about itself is not very
|
|
useful, it is a good starting example. Save the following basic
|
|
Prometheus configuration as a file named `prometheus.yml`:
|
|
|
|
```yaml
|
|
global:
|
|
scrape_interval: 15s # By default, scrape targets every 15 seconds.
|
|
|
|
# Attach these labels to any time series or alerts when communicating with
|
|
# external systems (federation, remote storage, Alertmanager).
|
|
external_labels:
|
|
monitor: 'codelab-monitor'
|
|
|
|
# A scrape configuration containing exactly one endpoint to scrape:
|
|
# Here it's Prometheus itself.
|
|
scrape_configs:
|
|
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
|
|
- job_name: 'prometheus'
|
|
|
|
# Override the global default and scrape targets from this job every 5 seconds.
|
|
scrape_interval: 5s
|
|
|
|
static_configs:
|
|
- targets: ['localhost:9090']
|
|
```
|
|
|
|
For a complete specification of configuration options, see the
|
|
[configuration documentation](configuration/configuration.md).
|
|
|
|
## Starting Prometheus
|
|
|
|
To start Prometheus with your newly created configuration file, change to the
|
|
directory containing the Prometheus binary and run:
|
|
|
|
```bash
|
|
# Start Prometheus.
|
|
# By default, Prometheus stores its database in ./data (flag --storage.tsdb.path).
|
|
./prometheus --config.file=prometheus.yml
|
|
```
|
|
|
|
Prometheus should start up. You should also be able to browse to a status page
|
|
about itself at [localhost:9090](http://localhost:9090). Give it a couple of
|
|
seconds to collect data about itself from its own HTTP metrics endpoint.
|
|
|
|
You can also verify that Prometheus is serving metrics about itself by
|
|
navigating to its metrics endpoint:
|
|
[localhost:9090/metrics](http://localhost:9090/metrics)
|
|
|
|
## Using the expression browser
|
|
|
|
Let us explore data that Prometheus has collected about itself. To
|
|
use Prometheus's built-in expression browser, navigate to
|
|
http://localhost:9090/graph and choose the "Table" view within the "Graph" tab.
|
|
|
|
As you can gather from [localhost:9090/metrics](http://localhost:9090/metrics),
|
|
one metric that Prometheus exports about itself is named
|
|
`prometheus_target_interval_length_seconds` (the actual amount of time between
|
|
target scrapes). Enter the below into the expression console and then click "Execute":
|
|
|
|
```
|
|
prometheus_target_interval_length_seconds
|
|
```
|
|
|
|
This should return a number of different time series (along with the latest value
|
|
recorded for each), each with the metric name
|
|
`prometheus_target_interval_length_seconds`, but with different labels. These
|
|
labels designate different latency percentiles and target group intervals.
|
|
|
|
If we are interested only in 99th percentile latencies, we could use this
|
|
query:
|
|
|
|
```
|
|
prometheus_target_interval_length_seconds{quantile="0.99"}
|
|
```
|
|
|
|
To count the number of returned time series, you could write:
|
|
|
|
```
|
|
count(prometheus_target_interval_length_seconds)
|
|
```
|
|
|
|
For more about the expression language, see the
|
|
[expression language documentation](querying/basics.md).
|
|
|
|
## Using the graphing interface
|
|
|
|
To graph expressions, navigate to http://localhost:9090/graph and use the "Graph"
|
|
tab.
|
|
|
|
For example, enter the following expression to graph the per-second rate of chunks
|
|
being created in the self-scraped Prometheus:
|
|
|
|
```
|
|
rate(prometheus_tsdb_head_chunks_created_total[1m])
|
|
```
|
|
|
|
Experiment with the graph range parameters and other settings.
|
|
|
|
## Starting up some sample targets
|
|
|
|
Let's add additional targets for Prometheus to scrape.
|
|
|
|
The Node Exporter is used as an example target, for more information on using it
|
|
[see these instructions.](https://prometheus.io/docs/guides/node-exporter/)
|
|
|
|
```bash
|
|
tar -xzvf node_exporter-*.*.tar.gz
|
|
cd node_exporter-*.*
|
|
|
|
# Start 3 example targets in separate terminals:
|
|
./node_exporter --web.listen-address 127.0.0.1:8080
|
|
./node_exporter --web.listen-address 127.0.0.1:8081
|
|
./node_exporter --web.listen-address 127.0.0.1:8082
|
|
```
|
|
|
|
You should now have example targets listening on http://localhost:8080/metrics,
|
|
http://localhost:8081/metrics, and http://localhost:8082/metrics.
|
|
|
|
## Configure Prometheus to monitor the sample targets
|
|
|
|
Now we will configure Prometheus to scrape these new targets. Let's group all
|
|
three endpoints into one job called `node`. We will imagine that the
|
|
first two endpoints are production targets, while the third one represents a
|
|
canary instance. To model this in Prometheus, we can add several groups of
|
|
endpoints to a single job, adding extra labels to each group of targets. In
|
|
this example, we will add the `group="production"` label to the first group of
|
|
targets, while adding `group="canary"` to the second.
|
|
|
|
To achieve this, add the following job definition to the `scrape_configs`
|
|
section in your `prometheus.yml` and restart your Prometheus instance:
|
|
|
|
```yaml
|
|
scrape_configs:
|
|
- job_name: 'node'
|
|
|
|
# Override the global default and scrape targets from this job every 5 seconds.
|
|
scrape_interval: 5s
|
|
|
|
static_configs:
|
|
- targets: ['localhost:8080', 'localhost:8081']
|
|
labels:
|
|
group: 'production'
|
|
|
|
- targets: ['localhost:8082']
|
|
labels:
|
|
group: 'canary'
|
|
```
|
|
|
|
Go to the expression browser and verify that Prometheus now has information
|
|
about time series that these example endpoints expose, such as `node_cpu_seconds_total`.
|
|
|
|
## Configure rules for aggregating scraped data into new time series
|
|
|
|
Though not a problem in our example, queries that aggregate over thousands of
|
|
time series can get slow when computed ad-hoc. To make this more efficient,
|
|
Prometheus can prerecord expressions into new persisted
|
|
time series via configured _recording rules_. Let's say we are interested in
|
|
recording the per-second rate of cpu time (`node_cpu_seconds_total`) averaged
|
|
over all cpus per instance (but preserving the `job`, `instance` and `mode`
|
|
dimensions) as measured over a window of 5 minutes. We could write this as:
|
|
|
|
```
|
|
avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
|
|
```
|
|
|
|
Try graphing this expression.
|
|
|
|
To record the time series resulting from this expression into a new metric
|
|
called `job_instance_mode:node_cpu_seconds:avg_rate5m`, create a file
|
|
with the following recording rule and save it as `prometheus.rules.yml`:
|
|
|
|
```yaml
|
|
groups:
|
|
- name: cpu-node
|
|
rules:
|
|
- record: job_instance_mode:node_cpu_seconds:avg_rate5m
|
|
expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total[5m]))
|
|
```
|
|
|
|
To make Prometheus pick up this new rule, add a `rule_files` statement in your `prometheus.yml`. The config should now
|
|
look like this:
|
|
|
|
```yaml
|
|
global:
|
|
scrape_interval: 15s # By default, scrape targets every 15 seconds.
|
|
evaluation_interval: 15s # Evaluate rules every 15 seconds.
|
|
|
|
# Attach these extra labels to all timeseries collected by this Prometheus instance.
|
|
external_labels:
|
|
monitor: 'codelab-monitor'
|
|
|
|
rule_files:
|
|
- 'prometheus.rules.yml'
|
|
|
|
scrape_configs:
|
|
- job_name: 'prometheus'
|
|
|
|
# Override the global default and scrape targets from this job every 5 seconds.
|
|
scrape_interval: 5s
|
|
|
|
static_configs:
|
|
- targets: ['localhost:9090']
|
|
|
|
- job_name: 'node'
|
|
|
|
# Override the global default and scrape targets from this job every 5 seconds.
|
|
scrape_interval: 5s
|
|
|
|
static_configs:
|
|
- targets: ['localhost:8080', 'localhost:8081']
|
|
labels:
|
|
group: 'production'
|
|
|
|
- targets: ['localhost:8082']
|
|
labels:
|
|
group: 'canary'
|
|
```
|
|
|
|
Restart Prometheus with the new configuration and verify that a new time series
|
|
with the metric name `job_instance_mode:node_cpu_seconds:avg_rate5m`
|
|
is now available by querying it through the expression browser or graphing it.
|
|
|
|
## Reloading configuration
|
|
|
|
As mentioned in the [configuration documentation](configuration/configuration.md) a
|
|
Prometheus instance can have its configuration reloaded without restarting the
|
|
process by using the `SIGHUP` signal. If you're running on Linux this can be
|
|
performed by using `kill -s SIGHUP <PID>`, replacing `<PID>` with your Prometheus
|
|
process ID.
|
|
|
|
## Shutting down your instance gracefully.
|
|
|
|
While Prometheus does have recovery mechanisms in the case that there is an
|
|
abrupt process failure it is recommended to use signals or interrupts for a
|
|
clean shutdown of a Prometheus instance. On Linux, this can be done by sending
|
|
the `SIGTERM` or `SIGINT` signals to the Prometheus process. For example, you
|
|
can use `kill -s <SIGNAL> <PID>`, replacing `<SIGNAL>` with the signal name
|
|
and `<PID>` with the Prometheus process ID. Alternatively, you can press the
|
|
interrupt character at the controlling terminal, which by default is `^C` (Control-C).
|