Adding the following accross the project: ```bash /# exit immediately when a command fails set -e /# only exit with zero if all commands of the pipeline exit successfully set -o pipefail /# error on unset variables +set -u ```
kube-prometheus
Note that everything in the
contrib/kube-prometheus/directory is experimental and may change significantly at any time.
This repository collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.
The content of this project is written in jsonnet. This project could both be described as a package as well as a library.
Components included in this package:
- The Prometheus Operator
- Highly available Prometheus
- Highly available Alertmanager
- Prometheus node-exporter
- kube-state-metrics
- Grafana
This stack is meant for cluster monitoring, so it is pre-configured to collect metrics from all Kubernetes components. In addition to that it delivers a default set of dashboards and alerting rules. Many of the useful dashboards and alerts come from the kubernetes-mixin project, similar to this project it provides composable jsonnet as a library for users to customize to their needs.
Table of contents
- Prerequisites
- Quickstart
- Usage
- Configuration
- Customization
- Minikube Example
- Troubleshooting
- Contributing
Prerequisites
You will need a Kubernetes cluster, that's it! By default it is assumed, that the kubelet uses token authN and authZ, as otherwise Prometheus needs a client certificate, which gives it full access to the kubelet, rather than just the metrics. Token authN and authZ allows more fine grained and easier access control.
This means the kubelet configuration must contain these flags:
- --authentication-token-webhook=trueThis flag enables, that a- ServiceAccounttoken can be used to authenticate against the kubelet(s).
- --authorization-mode=WebhookThis flag enables, that the kubelet will perform an RBAC request with the API to determine, whether the requesting entity (Prometheus in this case) is allow to access a resource, in specific for this project the- /metricsendpoint.
minikube
In order to just try out this stack, start minikube with the following command:
$ minikube delete && minikube start --kubernetes-version=v1.10.1 --memory=4096 --bootstrapper=kubeadm --extra-config=kubelet.authentication-token-webhook=true --extra-config=kubelet.authorization-mode=Webhook --extra-config=scheduler.address=0.0.0.0 --extra-config=controller-manager.address=0.0.0.0
Quickstart
Although this project is intended to be used as a library, a compiled version of the Kubernetes manifests generated with this library is checked into this repository in order to try the content out quickly.
Simply create the stack:
$ kubectl create -f manifests/
Usage
The content of this project consists of a set of jsonnet files making up a library to be consumed.
Install this library in your own project with jsonnet-bundler:
$ mkdir my-kube-prometheus; cd my-kube-prometheus
$ jb init
$ jb install github.com/coreos/prometheus-operator/contrib/kube-prometheus/jsonnet/kube-prometheus
jbcan be installed withgo get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
You may wish to not use ksonnet and simply render the generated manifests to files on disk, this can be done with:
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + {
  _config+:: {
    namespace: 'monitoring',
  },
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
This renders all manifests in a json structure of {filename: manifest-content}.
Compiling
To compile the above and get each manifest in a separate file on disk use the following script:
#!/usr/bin/env bash
set -e
set -x
# only exit with zero if all commands of the pipeline exit successfully
set -o pipefail
# Make sure to start with a clean 'manifests' dir
rm -rf manifests
mkdir manifests
                                               # optional, but we would like to generate yaml, not json
jsonnet -J vendor -m manifests "${1-example.jsonnet}" | xargs -I{} sh -c 'cat {} | gojsontoyaml > {}.yaml; rm -f {}' -- {}
Note you need
jsonnetandgojsonyaml(go get github.com/brancz/gojsontoyaml) installed. If you just want json output, not yaml, then you can skip the pipe and everything afterwards.
This script reads each key of the generated json and uses that as the file name, and writes the value of that key to that file.
You can also run this script executing the command
make generate-rawfrom kube-prometheus base directory of this repository but the above option it is recommended so that you run it in your own infrastructure repository.
Configuration
A hidden _config field is located at the top level of the object this library provides. These are the available fields with their respective default values:
{
	_config+:: {
        namespace: "default",
        versions+:: {
            alertmanager: "v0.14.0",
            nodeExporter: "v0.15.2",
            kubeStateMetrics: "v1.3.0",
            kubeRbacProxy: "v0.3.0",
            addonResizer: "1.0",
            prometheusOperator: "v0.18.1",
            prometheus: "v2.2.1",
        },
        imageRepos+:: {
            prometheus: "quay.io/prometheus/prometheus",
            alertmanager: "quay.io/prometheus/alertmanager",
            kubeStateMetrics: "quay.io/coreos/kube-state-metrics",
            kubeRbacProxy: "quay.io/coreos/kube-rbac-proxy",
            addonResizer: "quay.io/coreos/addon-resizer",
            nodeExporter: "quay.io/prometheus/node-exporter",
            prometheusOperator: "quay.io/coreos/prometheus-operator",
        },
        prometheus+:: {
            replicas: 2,
            rules: {},
        },
        alertmanager+:: {
            config: alertmanagerConfig,
            replicas: 3,
        },
	},
}
The grafana definition is located in a different project (https://github.com/brancz/kubernetes-grafana), but needed configuration can be customized from the same file. F.e. to allow anonymous access to grafana, add the _config section:
      grafana+:: {
        config: {
          sections: {
            "auth.anonymous": {enabled: true},
          },
        },
      },
Customization
Jsonnet is a turing complete language, any logic can be reflected in it. It also has powerful merge functionalities, allowing sophisticated customizations of any kind simply by merging it into the object the library provides.
A common example is that not all Kubernetes clusters are created exactly the same way, meaning the configuration to monitor them may be slightly different. For kubeadm and bootkube clusters there are mixins available to easily configure these:
kubeadm:
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet')
bootkube:
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-bootkube.libsonnet')
kops:
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-kops.libsonnet')
Another mixin that may be useful for exploring the stack is to expose the UIs of Prometheus, Alertmanager and Grafana on NodePorts:
(import 'kube-prometheus/kube-prometheus.libsonnet') +
(import 'kube-prometheus/kube-prometheus-node-ports.libsonnet')
For example the name of the Prometheus object provided by this library can be overridden:
((import 'kube-prometheus/kube-prometheus.libsonnet') + {
   prometheus+: {
     prometheus+: {
       metadata+: {
         name: 'my-name',
       },
     },
   },
 }).prometheus.prometheus
Standard Kubernetes manifests are all written using ksonnet-lib, so they can be modified with the mixins supplied by ksonnet-lib. For example to override the namespace of the node-exporter DaemonSet:
local k = import 'ksonnet/ksonnet.beta.3/k.libsonnet';
local daemonset = k.apps.v1beta2.daemonSet;
((import 'kube-prometheus/kube-prometheus.libsonnet') + {
   nodeExporter+: {
     daemonset+:
       daemonset.mixin.metadata.withNamespace('my-custom-namespace'),
   },
 }).nodeExporter.daemonset
Alertmanager configuration
The Alertmanager configuration is located in the _config.alertmanager.config configuration field. In order to set a custom Alertmanager configuration simply set this field.
((import 'kube-prometheus/kube-prometheus.libsonnet') + {
   _config+:: {
     alertmanager+: {
       config: |||
         global:
           resolve_timeout: 10m
         route:
           group_by: ['job']
           group_wait: 30s
           group_interval: 5m
           repeat_interval: 12h
           receiver: 'null'
           routes:
           - match:
               alertname: DeadMansSwitch
             receiver: 'null'
         receivers:
         - name: 'null'
       |||,
     },
   },
 }).alertmanager.secret
In the above example the configuration has been inlined, but can just as well be an external file imported in jsonnet via the importstr function.
((import 'kube-prometheus/kube-prometheus.libsonnet') + {
   _config+:: {
     alertmanager+: {
       config: importstr 'alertmanager-config.yaml',
     },
   },
 }).alertmanager.secret
Adding additional namespaces to monitor
In order to monitor additional namespaces, the Prometheus server requires the appropriate Role and RoleBinding to be able to discover targets from that namespace. By default the Prometheus server is limited to the three namespaces it requires: default, kube-system and the namespace you configure the stack to run in via $._config.namespace. This is specified in $._config.prometheus.namespaces, to add new namespaces to monitor, simply append the additional namespaces:
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') + {
  _config+:: {
    namespace: 'monitoring',
    prometheus+:: {
      namespaces+: ['my-namespace', 'my-second-namespace'],
    },
  },
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
Static etcd configuration
In order to configure a static etcd cluster to scrape there is a simple mixin prepared, so only the IPs and certificate information need to be configured. Simply append the kube-prometheus/kube-prometheus-static-etcd.libsonnet mixin to the rest of the configuration, and configure the ips to be the IPs to scrape, and the clientCA, clientKey and clientCert to values that are valid to scrape etcd metrics with.
Most likely these certificates are generated somewhere in an infrastructure repository, so using the jsonnet importstr function can be useful here. All the sensitive information on the certificates will end up in a Kubernetes Secret.
local kp = (import 'kube-prometheus/kube-prometheus.libsonnet') +
           (import 'kube-prometheus/kube-prometheus-static-etcd.libsonnet') + {
  _config+:: {
    namespace: 'monitoring',
    etcd+:: {
      ips: ['127.0.0.1'],
      clientCA: importstr 'etcd-client-ca.crt',
      clientKey: importstr 'etcd-client.key',
      clientCert: importstr 'etcd-client.crt',
      serverName: 'etcd.my-cluster.local',
    },
  },
};
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
Customizing Prometheus alerting/recording rules and Grafana dashboards
See developing Prometheus rules and Grafana dashboards guide.
Exposing Prometheus/Alermanager/Grafana via Ingress
See exposing Prometheus/Alertmanager/Grafana guide.
Minikube Example
To use an easy to reproduce example, let's take the minikube setup as demonstrated in prerequisites. It is a kubeadm cluster (as we use the kubeadm bootstrapper) and because we would like easy access to our Prometheus, Alertmanager and Grafana UI we want the services to be exposed as NodePort type services:
Note that NodePort type services is likely not a good idea for your production use case, it is only used for demonstration purposes here.
local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-kubeadm.libsonnet') +
  (import 'kube-prometheus/kube-prometheus-node-ports.libsonnet') +
  {
    _config+:: {
      namespace: 'monitoring',
    },
  };
{ ['00namespace-' + name]: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name]: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name]: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name]: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name]: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name]: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['grafana-' + name]: kp.grafana[name] for name in std.objectFields(kp.grafana) }
Troubleshooting
Error retrieving kubelet metrics
Should the Prometheus /targets page show kubelet targets, but not able to successfully scrape the metrics, then most likely it is a problem with the authentication and authorization setup of the kubelets.
As described in the prerequisites section, in order to retrieve metrics from the kubelet token authentication and authorization must be enabled. Some Kubernetes setup tools do not enable this by default.
If you are using Google's GKE product, see [docs/GKE-cadvisor-support.md].
Authentication problem
The Prometheus /targets page will show the kubelet job with the error 403 Unauthorized, when token authentication is not enabled. Ensure, that the --authentication-token-webhook=true flag is enabled on all kubelet configurations.
Authorization problem
The Prometheus /targets page will show the kubelet job with the error 401 Unauthorized, when token authorization is not enabled. Ensure that the --authorization-mode=Webhook flag is enabled on all kubelet configurations.
kube-state-metrics resource usage
In some environments, kube-state-metrics may need additional resources. One driver for more resource needs, is a high number of namespaces. There may be others.
kube-state-metrics resource allocation is managed by addon-resizer You can control it's parameters by setting variables in the config. They default to:
    kubeStateMetrics+:: {
      baseCPU: '100m',
      cpuPerNode: '2m',
      baseMemory: '150Mi',
      memoryPerNode: '30Mi',
    }
Contributing
All .yaml files in the /manifests folder are generated via
Jsonnet. Contributing changes will most likely include
the following process:
- Make your changes in the respective *.jsonnetfile.
- Commit your changes (This is currently necessary due to our vendoring process. This is likely to change in the future).
- Generate dependent *.yamlfiles:make generate-in-docker.
- Commit the generated changes.