864 Commits

Author SHA1 Message Date
Haoyu Sun
e7d2135cea
replace deprecated "app" label selector by "app.kubernetes.io/name" in
documents and examples
2021-10-08 14:00:52 +02:00
Haoyu Sun
b6c97fc6c0
remove "app" label selector deprecated by prometheus-operator 2021-10-05 19:59:39 +02:00
dgrisonnet
fe374485a1 [bot] [main] Automated version update 2021-10-04 07:39:27 +00:00
Damien Grisonnet
374413f10a
Merge pull request #1409 from dgrisonnet/drop-pa-metrics
Drop some of the metrics exposed by prometheus-adapter
2021-09-30 17:45:15 +02:00
Damien Grisonnet
5ebbb65276 jsonnet: drop some of prometheus-adapter metrics
The current implementation of prometheus-adapter exposes a lot of
metrics about the health of its aggregated apiserver. The issue is that
the some of these metrics are not very useful in the context of
prometheus-adapter, and we currently can't avoid exposing them since
they are registered to the Kubernetes global Prometheus registry. Until
this is improved in upstream Kubernetes, we could benefit from dropping
some of the metrics that are not very useful.

Before this change, in a default kube-prometheus installation, we would
have 800+ series for prometheus-adapter against 400+, so we divided the
number of series by two will focusing on the most valuable metrics for
prometheus-adapter.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-09-29 13:02:00 +02:00
Jan Fajerski
6fa097c0ed jsonnet/node-exporter: adjust to node-exporter v1.2.0 arg name change
In version [node-exporter v1.2.0](https://github.com/prometheus/node_exporter/releases/tag/v1.2.0)
two argument name changes were introduced. While the old names still
work (with a deprecation warning), lets use the new names.

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2021-09-29 11:15:10 +02:00
Damien Grisonnet
a2eee1803a
Merge pull request #1404 from prometheus-operator/automated-updates-main
[bot] [main] Automated version update
2021-09-28 11:56:30 +02:00
simonpasquier
9a3d5d42e1 [bot] [main] Automated version update 2021-09-28 08:41:28 +00:00
Philip Gough
58e2c131c2 Keep 'container_fs_.*' metrics from cAdvisor 2021-09-27 17:13:00 +01:00
dgrisonnet
02776a1d37 [bot] [main] Automated version update 2021-09-27 09:53:31 +00:00
Arunprasad Rajkumar
c5d265a14e
thanos: bump to latest and add thanosPrometheusCommonDimensions
This commit pulls latest changes from thanos mixins and sets `thanosPrometheusCommonDimensions`
to `namespace, pod` for k8s use case.

Refer https://github.com/thanos-io/thanos/pull/4508 for more details.

Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
2021-09-27 12:07:08 +05:30
Philip Gough
56f96e6389 Adjust dropped metrics from cAdvisor
This change drops pod-centric metrics without a non-empty 'container' label.

Previously we dropped pod-centric metrics without a (pod, namespace) label set
however these can be critical for debugging.
2021-09-24 17:24:01 +01:00
Damien Grisonnet
7f1092cdde
Merge pull request #1344 from PhilipGough/MON-1085
jsonnet: Support scraping the config-reloader for AlertManager and Pr…
2021-09-22 16:16:48 +02:00
Philip Gough
7b32afb8aa jsonnet: Support scraping the config-reloader for AlertManager and Prometheus 2021-09-22 14:54:12 +01:00
dgrisonnet
a232cca3b6 [bot] [main] Automated version update 2021-09-20 07:39:09 +00:00
Sylvain Pasche
6d5c1b793c Always generate grafana-config secret
Since https://github.com/brancz/kubernetes-grafana/pull/115, upstream
grafana contains a non-empty config. Generate the grafana-config secret
unconditionally even if no user config is passed.
2021-09-16 14:25:53 +02:00
dgrisonnet
6654c13142 [bot] [main] Automated version update 2021-09-13 07:39:05 +00:00
Damien Grisonnet
6f744e24a5
Merge pull request #1357 from arajkumar/adjust-NodeFilesystemSpaceFillingUp-warning-threshold
Adjust node filesystem space filling up warning threshold to 20%
2021-09-06 19:04:29 +02:00
Arunprasad Rajkumar
4de44139ec
add comments to reason fsSpaceFilling threshold adjustment
Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
2021-09-02 17:38:02 +05:30
Arunprasad Rajkumar
03471fd86f
Adjust threshold for SpaceFillingUp warning alert
Reduce threshold of NodeFilesystemSpaceFillingUp warning alert to 20% space available, instead of 40% (default).

This will align the threshold according to default kubelet GC values
below[1],

"imageMinimumGCAge": "2m0s",
"imageGCHighThresholdPercent": 85,
"imageGCLowThresholdPercent": 80,

[1] https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

Signed-off-by: Arunprasad Rajkumar <arajkuma@redhat.com>
2021-09-01 13:29:36 +05:30
dgrisonnet
a1c6a4e21d [bot] [main] Automated version update 2021-08-30 07:39:09 +00:00
simonpasquier
eb52023db2 [bot] [main] Automated version update 2021-08-25 09:37:24 +00:00
Abhilash Pallerlamudi
9e8926511f fix sync-to-internal-registry.jsonnet 2021-08-23 12:45:51 -07:00
Damien Grisonnet
9ef6dff167 jsonnet: unpin dependencies
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-20 13:49:12 +02:00
Damien Grisonnet
eca67844af jsonnet: pin and update jsonnet depdencies
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-19 16:41:53 +02:00
Damien Grisonnet
b5ec93208b jsonnet: drop deprecated etcd metric
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-18 17:27:50 +02:00
Damien Grisonnet
45adc03cfb jsonnet: update prometheus-adapter to v0.9.0
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-08-17 18:05:45 +02:00
paulfantom
c4113807fb
jsonnet: set thanos config to null by default
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-08-16 15:16:52 +02:00
paulfantom
ad3fc8920e [bot] [main] Automated version update 2021-08-16 08:04:51 +00:00
Dimitrije Manic
12cd7fd9ce Prometheus ruleSelector defaults to all rules 2021-08-11 10:16:24 -04:00
paulfantom
f6d6b30aed
jsonnet: use full dependency path 2021-08-06 14:15:23 +02:00
Damien Grisonnet
33cc694f18
Merge pull request #1308 from PaytmLabs/feature/separate-thanos-rules
Create Thanos Sidecar rules separately from Prometheus ones
2021-08-05 16:19:01 +02:00
Maxime Brunet
961f138dd0
Add back _config.runbookURLPattern for Thanos Sidecar rules 2021-08-04 14:22:06 -07:00
Paweł Krupa
54d8f88162
Merge pull request #1307 from PaytmLabs/feature/addons/aws-vpc-cni
Turn AWS VPC CNI into a control plane add-on
2021-08-04 09:56:50 +02:00
Paweł Krupa
e931a417fc
Merge pull request #1230 from Luis-TT/fix-kube-proxy-dashboard 2021-08-04 09:55:09 +02:00
Luis Vidal Ernst
0b49c3102d Added PodMonitor for kube-proxy 2021-08-03 08:31:49 +02:00
Maxime Brunet
0e7dc97bc5
Create Thanos Sidecar rules separately from Prometheus ones 2021-08-02 12:46:06 -07:00
Maxime Brunet
d3ccfb8220
Turn AWS VPC CNI into a control plane add-on 2021-08-02 11:26:33 -07:00
dgrisonnet
e97eb0fbe9 [bot] [main] Automated version update 2021-08-02 13:37:08 +00:00
Maxime Brunet
b7fe018d29
eks: Revert back to awscni_total_ip_addresses-based alert 2021-07-31 11:37:12 -07:00
Paweł Krupa
b9c73c7b29
Merge pull request #1283 from prashbnair/node-veth
changing node exporter ignore list
2021-07-28 09:17:03 +02:00
Prashant Balachandran
09fdac739d changing node exporter ignore list 2021-07-27 17:17:19 +05:30
Paweł Krupa
785789b776
Merge pull request #1257 from Luis-TT/kube-state-metrics-kubac-proxy-resources 2021-07-27 12:36:26 +02:00
lanmarti
ed48391831 Add resource requests and limits to prometheus-adapter container 2021-07-27 12:19:51 +02:00
Maxime Brunet
3a98a3478c
eks: Fix CNI metrics relabelings
Signed-off-by: Maxime Brunet <maxime.brunet@paytm.com>
2021-07-23 13:39:29 -07:00
Manuel Rüger
acd1eeba4c node.libsonnet: Fix small typo
Signed-off-by: Manuel Rüger <manuel@rueg.eu>
2021-07-22 19:14:24 +02:00
paulfantom
cfe830f8f0
jsonnet/kube-prometheus: point to runbooks.prometheus-operator.dev
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-07-22 17:30:57 +02:00
Luis Vidal Ernst
9c638162ae Allow customizing of kubeRbacProxy in kube-state-metrics 2021-07-21 13:57:05 +02:00
Paweł Krupa
acea5efd85
Merge pull request #1268 from paulfantom/alerts-best-practices
Alerts best practices
2021-07-21 09:32:32 +02:00
Philip Gough
463ad065d3 jsonnet: Drop cAdvisor metrics with no (pod, namespace) labels while preserving ability to monitor system services resource usage
The following provides a description and cardinality estimation based on the tests in a local cluster:

container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.*                    - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors         - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max              - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads                  - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets                  - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds       - container start. Possibly not needed for system services (nodes*services)
container_last_seen                - Not needed as system services are always running (nodes*services)
container_spec_.*                  - Everything related to cgroup specification and thus static data (nodes*services*5)
2021-07-20 12:50:02 +01:00