The current implementation of prometheus-adapter exposes a lot of
metrics about the health of its aggregated apiserver. The issue is that
the some of these metrics are not very useful in the context of
prometheus-adapter, and we currently can't avoid exposing them since
they are registered to the Kubernetes global Prometheus registry. Until
this is improved in upstream Kubernetes, we could benefit from dropping
some of the metrics that are not very useful.
Before this change, in a default kube-prometheus installation, we would
have 800+ series for prometheus-adapter against 400+, so we divided the
number of series by two will focusing on the most valuable metrics for
prometheus-adapter.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
In version [node-exporter v1.2.0](https://github.com/prometheus/node_exporter/releases/tag/v1.2.0)
two argument name changes were introduced. While the old names still
work (with a deprecation warning), lets use the new names.
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
This commit pulls latest changes from thanos mixins and sets `thanosPrometheusCommonDimensions`
to `namespace, pod` for k8s use case.
Refer https://github.com/thanos-io/thanos/pull/4508 for more details.
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
This change drops pod-centric metrics without a non-empty 'container' label.
Previously we dropped pod-centric metrics without a (pod, namespace) label set
however these can be critical for debugging.
Reduce threshold of NodeFilesystemSpaceFillingUp warning alert to 20% space available, instead of 40% (default).
This will align the threshold according to default kubelet GC values
below[1],
"imageMinimumGCAge": "2m0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80,
[1] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
The following provides a description and cardinality estimation based on the tests in a local cluster:
container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.* - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds - container start. Possibly not needed for system services (nodes*services)
container_last_seen - Not needed as system services are always running (nodes*services)
container_spec_.* - Everything related to cgroup specification and thus static data (nodes*services*5)