73 Commits

Author SHA1 Message Date
Philip Gough
463ad065d3 jsonnet: Drop cAdvisor metrics with no (pod, namespace) labels while preserving ability to monitor system services resource usage
The following provides a description and cardinality estimation based on the tests in a local cluster:

container_blkio_device_usage_total - useful for containers, but not for system services (nodes*disks*services*operations*2)
container_fs_.*                    - add filesystem read/write data (nodes*disks*services*4)
container_file_descriptors         - file descriptors limits and global numbers are exposed via (nodes*services)
container_threads_max              - max number of threads in cgroup. Usually for system services it is not limited (nodes*services)
container_threads                  - used threads in cgroup. Usually not important for system services (nodes*services)
container_sockets                  - used sockets in cgroup. Usually not important for system services (nodes*services)
container_start_time_seconds       - container start. Possibly not needed for system services (nodes*services)
container_last_seen                - Not needed as system services are always running (nodes*services)
container_spec_.*                  - Everything related to cgroup specification and thus static data (nodes*services*5)
2021-07-20 12:50:02 +01:00
Paweł Krupa
80bb15bedd
Merge pull request #1255 from yeya24/fix-dashboards-definition-length-check 2021-07-19 09:56:09 +02:00
Yury Gargay
9b08b941f8 Update kubernetes-mixin
From b710a868a9
2021-07-14 18:51:36 +02:00
ben.ye
43adca8df7 fmt again
Signed-off-by: ben.ye <ben.ye@bytedance.com>
2021-07-13 19:56:38 -07:00
ben.ye
90b2751f06 fmt code
Signed-off-by: ben.ye <ben.ye@bytedance.com>
2021-07-13 19:48:01 -07:00
ben.ye
dee7762ae3 create dashboardDefinitions if rawDashboards or folderDashboards are specified
Signed-off-by: ben.ye <ben.ye@bytedance.com>
2021-07-13 19:39:01 -07:00
Damien Grisonnet
97e77e9996
Merge pull request #1231 from dgrisonnet/fix-adapter-queries
Consolidate intervals used in prometheus-adapter CPU queries
2021-07-07 13:48:02 +02:00
Damien Grisonnet
b9563b9c2d jsonnet: improve adapter queries readability
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-07-05 15:29:45 +02:00
Damien Grisonnet
8812e45501 jsonnet: readjust prometheus-adapter intervals
Previously, prometheus-adapter configuration wasn't taking into account
the scrape interval of kubelet, node-exporter and windows-exporter
leading to getting non fresh results, and even negative results from the
CPU queries when the irate() function was extrapolating data.
To fix that, we want to set the interval used in the irate() function in
the CPU queries to 4x scrape interval in order to extrapolate data
between the last two scrapes. This will improve the freshness of the cpu
usage exposed and prevent incorrect extrapolations.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-07-05 15:28:25 +02:00
Sunil Thaha
0280f4ddf9 jsonnet: kube-prometheus adapt to changes to veth interfaces names
With OVN, the container veth network interface names that used to start
with `veth` has now changed to `<rand-hex>{15}@if<number>`(see Related
Links below).

This patch adapts to the new change introduced in ovn and ignores the network
interfaces that match `[a-z0-9]{15}@if\d+` in addition to those starting
with `veth`

Related Links:
  - https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/vendor/github.com/containernetworking/plugins/pkg/ip/link_linux.go#L107
  - https://github.com/openshift/ovn-kubernetes/blob/master/go-controller/pkg/cni/helper_linux.go#L148

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
2021-07-01 12:01:19 +10:00
fpetkovski
0ff173efea jsonnet: disable insecure cypher suites for prometheus-adapter
Running sslscan against the prometheus adapter secure port reports two
insecure SSL ciphers, ECDHE-RSA-DES-CBC3-SHA and DES-CBC3-SHA.

This commit removes those ciphers from the list.

Signed-off-by: fpetkovski <filip.petkovsky@gmail.com>
2021-06-22 14:17:09 +02:00
paulfantom
5ea10d80a1
jsonnet: fix label selector for coredns ServiceMonitor 2021-06-11 10:56:54 +02:00
anarcher
8bcfb98a1d feat(grafana): add env parameter for gradana component 2021-05-31 18:52:55 +09:00
Prem Saraswat
228f8ffdad Add support for feature-flags in Prometheus 2021-05-27 23:21:30 +05:30
paulfantom
ce7e86b93a
jsonnet/kube-prometheus: fix usage of latest thanos mixin 2021-05-25 16:03:39 +02:00
Piotr Piskiewicz
a8c344c848 jsonnet/components: fix missing resource config in blackbox exporter 2021-05-17 21:32:01 +02:00
Paweł Krupa
a1210f1eff
Merge pull request #1132 from paulfantom/ruleNamespaceSelector 2021-05-06 23:05:34 +02:00
Damien Grisonnet
a4a4d4b744 jsonnet: add PDB to prometheus-adapter
Adding a PodDisruptionBudget to prometheus-adapter ensure that at least
one replica of the adapter is always available. This make sure that even
during disruption the aggregated API is available and thus does not
impact the availability of the apiserver.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-05 16:15:25 +02:00
paulfantom
ee7fb97598
jsonnet: by default select rules from all available namespaces 2021-05-04 13:20:28 +02:00
paulfantom
b9a49678b2
jsonnet: fmt 2021-05-03 10:02:45 +02:00
paulfantom
2531c043dc
jsonnet: fix conflict resolution 2021-05-03 10:01:37 +02:00
Paweł Krupa
624c6c0108
Merge branch 'main' into feature/configRbacImage 2021-05-03 09:57:23 +02:00
Paweł Krupa
db7f3c9107
Merge pull request #1125 from kaflake/feature/configGrafanaImage
can change grafanaImage over $.values.common.images
2021-05-03 09:55:19 +02:00
Nagel, Felix
f107e8fb16 fix formatting issues 2021-05-03 06:59:10 +02:00
Nagel, Felix
14e6143037 replace double quotes with single quotes 2021-05-03 06:35:59 +02:00
Nagel, Felix
7e5d4196b9 can change grafanaImage over $.values.common.images 2021-04-30 14:05:23 +02:00
Nagel, Felix
5761267842 can change kubeRbacProxy over $.values.common.images 2021-04-30 13:48:34 +02:00
Nagel, Felix
be2964887f can change configmapReload over $.values.common.images 2021-04-30 12:46:48 +02:00
Paweł Krupa
a3d67f5219
Merge pull request #1095 from dgrisonnet/prometheus-adapter-ha
Make prometheus-adapter highly-available
2021-04-22 12:00:39 +02:00
Damien Grisonnet
4c6a06cf7e jsonnet: make prometheus-adapter highly-available
Prometheus-adapter is a component of the monitoring stack that in most
cases require to be highly available. For instance, we most likely
always want the autoscaling pipeline to be available and we also want to
avoid having no available backends serving the metrics API apiservices
has it would result in both the AggregatedAPIDown alert firing and the
kubectl top command not working anymore.

In order to make the adapter highly-avaible, we need to increase its
replica count to 2 and come up with a rolling update strategy and a
pod anti-affinity rule based on the kubernetes hostname to prevent the
adapters to be scheduled on the same node. The default rolling update
strategy for deployments isn't enough as the default maxUnavaible value
is 25% and is rounded down to 0. This means that during rolling-updates
scheduling will fail if there isn't more nodes than the number of
replicas. As for the maxSurge, the default should be fine as it is
rounded up to 1, but for clarity it might be better to just set it to 1.
For the pod anti-affinity constraints, it would be best if it was hard,
but having it soft should be good enough and fit most use-cases.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-04-22 09:57:14 +02:00
paulfantom
7b69800686
jsonnet: add default container annotation for KSM and blackbox
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-04-21 18:43:00 +02:00
ArthurSens
c96c639ef1 Add summary
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-04-16 18:06:47 +00:00
ArthurSens
92016ef68d Change message to description
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-04-16 18:06:47 +00:00
Paweł Krupa
07136d1d6e
Merge pull request #1039 from paulfantom/unify-config
jsonnet: unify internal configuration field name
2021-04-16 15:05:26 +02:00
Paweł Krupa
2ba8d8aca2
Merge pull request #1058 from mansikulkarni96/windows_exporter 2021-04-07 10:07:33 +02:00
mansikulkarni96
7ba0479433 jsonnet: Add windows_exporter queries for adapter
This commit includes windows_exporter metrics in the
node queries for the prometheus adapter configuration.
This will help obtain the resource metrics: memory and
CPU for Windows nodes. This change will also help in
displaying metrics reported through the 'kubectl top'
command which currently reports 'unknown' status for
Windows nodes.
2021-03-29 14:55:11 -04:00
viperstars
d1f401a73d add cluster role to list and watch ingresses in api group "networking.k8s.io" 2021-03-29 14:19:35 +08:00
paulfantom
0bf34a24f8
jsonnet: unify internal configuration field name
Signed-off-by: paulfantom <pawel@krupa.net.pl>
2021-03-22 12:48:55 +01:00
Petr Enkov
094cdb34e8
allow install grafana plugins 2021-03-22 11:57:11 +04:00
ArthurSens
2fa7ef162f Add externalLabels on Prometheus defaults
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-03-18 18:36:10 +00:00
paulfantom
0d2e0875d9
jsonnet/prometheus-adapter: include pause container in resource calculations 2021-03-16 15:17:22 +01:00
paulfantom
30a41d18d8
jsonnet: conditionally add PDB 2021-03-15 23:39:24 +01:00
paulfantom
9d327cb328
jsonnet: add PDB to alertmanager and prometheus pods 2021-03-15 16:33:18 +01:00
Matthias Loibl
8e5bf00c54
Merge pull request #984 from paulfantom/am_resources
jsonnet/alertmanager: add default alertmanager resource requirements
2021-03-08 10:20:20 +01:00
ArthurSens
bb2971e874 Add runbook_url annotation for custom mixins
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-03-05 14:07:01 +00:00
ArthurSens
e586afb280 Add runbook_url annotation to all alerts
Signed-off-by: ArthurSens <arthursens2005@gmail.com>
2021-03-05 13:39:40 +00:00
paulfantom
f7f817a79e
jsonnet/alertmanager: better name for prometheus-rule object 2021-03-01 13:26:46 +01:00
paulfantom
23c8d865f5
jsonnet/alertmanager: add default alertmanager resource requirements
Co-authored-by: Latch M <latch_mihaylov@homedepot.com>
2021-02-25 18:51:34 +01:00
Paweł Krupa
f691421c91
Merge pull request #960 from paulfantom/k8s-control-plane
Do not modify $.prometheus object when it is not needed (k8s control plane)
2021-02-23 10:30:17 +01:00
Frederic Branczyk
da05d36c31
Merge pull request #941 from paulfantom/ksm-krp-cpu
increase default CPU values for main kube-rbac-proxy sidecar in kube-state-metrics
2021-02-23 09:50:16 +01:00