181 Commits

Author SHA1 Message Date
Ed Schouten
a3e9628e0c
Kubernetes service discovery: add provider ID label (#9603)
When using Kubernetes on cloud providers, nodes will have the
spec.providerID field populated to contain the cloud provider specific
name of the EC2/GCE/...  instance.

Let's expose this information as an additional label, so that it's
easier to annotate metrics and alerts to contain the cloud provider
specific name of the instance to which it pertains.

Signed-off-by: Ed Schouten <eschouten@apple.com>
2021-12-06 22:27:11 +01:00
XU
3563db20e0
Fix docs/configuration typo (#9922)
Signed-off-by: qqbuby <qqbuby@gmail.com>
2021-12-06 16:21:48 +05:30
Callum Styan
086ca90b24
Update exemplar docs based on changes to exemplar storage configuration (#9868)
* Update exemplar docs based on changes from #8974

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix missing code block closing + unindent one level.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2021-12-01 10:30:08 +01:00
teuto.net Netzdienste GmbH
3ce6b48df6
fixes wrong metric name in documentation (#9828)
* fixes wrong metric name, see https://github.com/prometheus/prometheus/blob/main/discovery/openstack/hypervisor.go#L35

Signed-off-by: teuto.net Netzdienste GmbH <github@teuto.net>

* fixes parameter doc, sorted alphabetically

Signed-off-by: teuto.net Netzdienste GmbH <github@teuto.net>
2021-11-19 15:06:01 -05:00
Hu Shuai
5a9be19062
Fix a typo and the grammar in docs/configuration/configuration.md (#9717) 2021-11-11 07:10:40 -05:00
Bryan Boreham
1ed94142fc
remote-write: slow down retries to avoid DDOS (#9634)
* remote-write: slow down retries to avoid DDOS

Increase the default max retry time from 100ms to 5 seconds.

Remote write calls are retried after a recoverable error such as the
back-end returning 500. Prometheus waits the minimum time and retries,
then doubles the wait on each subsequent retry until the maximum is
reached.

If some data is still getting through, remote-write will also increase
shards, and the default maximum is 200. 200 shards sending every 100ms
is 20 calls per second, to a back-end that is already in trouble.

5 seconds was chosen to match the default BatchSendDeadline: if we can
afford to wait that long for no response, then we can wait the same time
to retry. We will reach 5 seconds after 9 successive failures.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Update config doc for max_backoff change

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2021-11-09 14:08:24 -08:00
Julien Pivotto
77f411b2ec
Enable tls_config in oauth2 (#9550)
* Enable tls_config in oauth2

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-10-20 23:10:18 +02:00
Levi Harrison
89a6ebd799
Add common HTTP client to Azure SD (#9267)
* Add `proxy_url` option to Azure SD

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-10-20 23:09:58 +02:00
Witek Bedyk
cda2dbbef6
Add Uyuni service discovery (#8190)
* Add Uyuni service discovery

Signed-off-by: Witek Bedyk <witold.bedyk@suse.com>

Co-authored-by: Joao Cavalheiro <jcavalheiro@suse.de>
Co-authored-by: Marcelo Chiaradia <mchiaradia@suse.com>
Co-authored-by: Stefano Torresi <stefano@torresi.io>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
2021-10-19 01:00:44 +02:00
la3mmchen
6d3a4ed711 fix/9269 add documentation for endpointslice
This commits add a documentation for the kubernetes_sd_configs: endpointslice feature.

Signed-off-by: la3mmchen <alex@k3wl.net>
2021-10-03 21:30:39 +02:00
Julien Pivotto
8920024323 Add PuppetDB service discovery
We have been Puppet user for 10 years and we are users of
https://github.com/camptocamp/prometheus-puppetdb-sd

However, that file_sd implementation contains business logic and
assumptions around e.g. the modules which you are using.

This pull request adds a simple PuppetDB service discovery, which will
enable more use cases than the upstream sd.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-09-16 16:54:26 +02:00
Levi Harrison
70f597b033
Configure Scrape Interval and Timeout Via Relabeling (#8911)
* Configure scrape interval and timeout with labels

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-08-31 17:37:32 +02:00
Julien Pivotto
cab96a06ef
Merge release 2.29 in main (#9196)
* PromQL: Fix start and end keywords masking label and metric names

This commit fixes an issue with the "at modifier" that introduced two
new keywords: `start` and `end`. In grouping options and in metric
names, these keywords took precedence over metric or label names, so
that those metrics and labels could no longer be referenced.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* Add in additional tests for metrics and/or labels called start/end.

Signed-off-by: Clayton Peters <clayton.peters@man.com>

* *: Cut 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* VERSION: bump to 2.29.0-rc.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Remove experimental wording on size-based retention

Followup of #9004

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Fix PR reference in changelog

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zone IDs at most once per refresh (#9142)

Signed-off-by: George Brighton <george@gebn.co.uk>

* Describe EC2 availability zones at most once per SD load

Closes #9142.

Signed-off-by: George Brighton <george@gebn.co.uk>

* Incorporate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Integrate feedback

Signed-off-by: George Brighton <george@gebn.co.uk>

* Add a compatibility note for macOS users.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* *: Cut v2.29.0-rc.1

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Fix `kuma_sd` targetgroup reporting (#9157)

* Bundle all xDS targets into a single group

Signed-off-by: austin ce <austin.cawley@gmail.com>

* *: cut v2.29.0-rc.2

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* Rename links

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* bump codemirror-promql to 0.17.0

Signed-off-by: Augustin Husson <husson.augustin@gmail.com>

* *: cut v2.29.0

Signed-off-by: Frederic Branczyk <fbranczyk@gmail.com>

* tsdb: align atomically accessed int64 (#9192)

This prevents a panic in 32-bit archs:
https://pkg.go.dev/sync/atomic#pkg-note-BUG

Fixed #9190

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* Release 2.29.1 (#9193)

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

Co-authored-by: Clayton Peters <clayton.peters@man.com>
Co-authored-by: Frederic Branczyk <fbranczyk@gmail.com>
Co-authored-by: George Brighton <george@gebn.co.uk>
Co-authored-by: Austin Cawley-Edwards <austin.cawley@gmail.com>
Co-authored-by: Levi Harrison <git@leviharrison.dev>
Co-authored-by: Augustin Husson <husson.augustin@gmail.com>
2021-08-12 18:38:06 +02:00
TJ Hoplock
7baf084092
optimize Linode SD by polling for event changes during refresh (#8980)
* optimize Linode SD by polling for event changes during refresh

Most accounts are fairly "static", in the sense that they're not cycling
through instances constantly. So rather than do a full refresh every
interval and potentially make several behind-the-scenes paginated API
calls, this will now poll the `/account/events/` endpoint every minute
with a list of events that we care about. If a matching event is found,
we then do a full refresh.

Co-authored-by: William Smith <wsmith@linode.com>
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Signed-off-by: William Smith <wsmith@linode.com>
2021-08-04 12:05:49 +02:00
Julien Pivotto
03bee3b5df
Merge pull request #9125 from LeviHarrison/docker_sd-host-networking
docker_sd: Support host network mode
2021-08-04 01:14:39 +02:00
Levi Harrison
c1b1b826ce HostNetworkHost -> HostNetworkingHost
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-08-03 05:58:49 -06:00
Julien Pivotto
24165adadc
Merge pull request #9112 from darshanime/add_computer_name
Add computer name to azure sd
2021-07-30 09:58:49 +02:00
Levi Harrison
3556302c76
Added docs
Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-07-27 23:33:40 -04:00
Julien Pivotto
dcba645366
Merge pull request #8978 from jfreeland/feat/additional-gce-interfaces
feat: explicit gce interface ipv4 address metadata
2021-07-26 19:38:59 +02:00
darshanime
c8a2ffdb72 Add computer name to azure sd
Signed-off-by: darshanime <deathbullet@gmail.com>
2021-07-25 22:07:44 +05:30
Julien Pivotto
79d354ad2e
Merge pull request #8844 from austince/feat/discovery-xds
Add base xDS REST SD and kuma_sd implementation
2021-07-23 09:46:36 +02:00
George Brighton
bc0e76c8a3
Add AZ ID label to discovered EC2 targets (#8896)
* Add AZ ID to EC2 SD

Signed-off-by: George Brighton <george@gebn.co.uk>
2021-07-23 09:42:03 +02:00
austin ce
3593b20cdb
Add documentation for kuma_sd configuration
Signed-off-by: austin ce <austin.cawley@gmail.com>
2021-07-21 12:55:02 -04:00
Lukas Kämmerling
263847e64a
hcloud discovery: Add new labelpresent label (#9028)
* Add new labelpresent label

Signed-off-by: Lukas Kämmerling <lukas.kaemmerling@hetzner-cloud.de>
2021-07-03 01:51:50 +02:00
Joey Freeland
8017dd7242 chore: always append interface ipv4 with api interface name
Signed-off-by: Joey Freeland <joey@free.land>
2021-06-29 09:01:34 -07:00
Levi Harrison
d5c3c567d3
Remote Write: Add max samples per metadata send (#8959)
* Added MaxSamplesPerSend

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added tests

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Fixed order of require

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Added docs

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* writes -> writesReceived

Signed-off-by: Levi Harrison <git@leviharrison.dev>

* Improved send loop

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-24 15:39:50 -07:00
Joey Freeland
77e25cf2e5 feat: gce metadata for additional interfaces
Signed-off-by: Joey Freeland <joey@free.land>
2021-06-21 21:37:04 -07:00
3Xpl0it3r
a0bac4b488
add kubeconfig support in discovery module (#8811)
Signed-off-by: 3Xpl0it3r <shouc.wang@hotmail.com>
2021-06-17 12:41:50 +02:00
Frederic Branczyk
039b651450
Merge pull request #8916 from Evesy/main
Add class label to kubernetes ingress discovery
2021-06-14 13:40:08 +02:00
koolwithk
80d69dd4e5
Docs - fix wrong spell 2021-06-14 09:38:06 +05:30
Levi Harrison
faed8df31d
Enable reading consul token from file (#8926)
* Adopted common http client

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-06-12 00:06:59 +02:00
Julien Pivotto
9444698ae2
http_sd (#8839)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-06-11 18:04:45 +02:00
Mike Eves
7e1111ff14 Update label from class to class_name
Signed-off-by: Mike Eves <michael.eves@autotrader.co.uk>
2021-06-11 13:45:41 +01:00
Mike Eves
aab51ffe2a Tweak docs
Signed-off-by: Mike Eves <michael.eves@autotrader.co.uk>
2021-06-11 11:27:15 +01:00
Mike Eves
22b16c30de Fix typo
Signed-off-by: Mike Eves <michael.eves@autotrader.co.uk>
2021-06-11 11:27:15 +01:00
Mike Eves
7e65ad3e43 Add class label to kubernetes ingress discovery
Signed-off-by: Mike Eves <michael.eves@autotrader.co.uk>
2021-06-11 11:27:15 +01:00
Frederic Hemberger
39a87fd9d2 consul_sd: Add namespace support for Consul Enterprise
Signed-off-by: Frederic Hemberger <mail@frederic-hemberger.de>
2021-06-09 16:35:02 +02:00
Julien Pivotto
609ba54b8f
Mark body_size_limit as experimental. (#8886)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-06-02 16:32:08 +01:00
Julien Pivotto
20c6739adc
Merge pull request #8833 from hanjm/feature/add-scape-read-body-limit
Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
2021-06-02 09:24:59 +02:00
TJ Hoplock
dc22c65349
Add Linode Service Discovery (#8846)
* Add Linode Service Discovery

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2021-06-01 20:32:36 +02:00
hanjm
1df05bfd49 Add body_size_limit to prevent bad targets response large body cause Prometheus server OOM (#8827)
Signed-off-by: hanjm <hanjinming@outlook.com>
2021-05-29 07:05:42 +08:00
Sandro
0ffcddbee8
Fix indentation
Signed-off-by: Sandro Jäckel <sandro.jaeckel@gmail.com>
2021-05-16 05:27:05 +02:00
Callum Styan
8fd73b1d28
Add Exemplar Remote Write support (#8296)
* Write exemplars to the WAL and send them over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update example for exemplars, print data in a more obvious format.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add metrics for remote write of exemplars.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix incorrect slices passed to send in remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* We need to unregister the new metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Order of exemplar append vs write exemplar to WAL needs to change.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Condense sample/exemplar delivery tests to parameterized sub-tests

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename test methods for clarity now that they also handle exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename counter variable. Fix instances where metrics were not updated correctly

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Add exemplars to LoadWAL benchmark

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* last exemplars timestamp metric needs to convert value to seconds with
ms precision

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Process exemplar records in a separate go routine when loading the WAL.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Regenerate types proto with comments, update protoc version again.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Put remote write of exemplars behind a feature flag.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some of Ganesh's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move exemplar remote write feature flag to a config file field.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address Bartek's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more reivew comments from Ganesh.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add exemplar total label length check.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address a few last review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-05-06 13:53:52 -07:00
Damien Grisonnet
b50f9c1c84
Add label scrape limits (#8777)
* scrape: add label limits per scrape

Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.

The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.

The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: add metrics and alert to label limits

Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: apply label limits to __name__ label

Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: remove label limits gauges and refactor

Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-06 09:56:21 +01:00
Levi Harrison
fa184a5fc3
Add OAuth 2.0 Config (#8761)
* Introduced oauth2 config into the codebase

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-04-28 14:47:52 +02:00
n888
7c028d59c2
Add lightsail service discovery (#8693)
Signed-off-by: N888 <drifto@gmail.com>
2021-04-28 11:29:12 +02:00
Robert Jacob
b253056163
Implement Docker discovery (#8629)
* Implement Docker discovery

Signed-off-by: Robert Jacob <xperimental@solidproject.de>
2021-03-29 22:30:23 +02:00
Julien Pivotto
5a6d244b00 Scaleway SD: Add the ability to read token from file
Prometheus adds the ability to read secrets from files. This add
this feature for the scaleway service discovery.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-25 00:52:33 +01:00
Andrew Starr-Bochicchio
da8a8585f5 Add vpc label to docs.
Signed-off-by: Andrew Starr-Bochicchio <a.starr.b@gmail.com>
2021-03-24 17:05:16 -04:00
Julien Pivotto
49016994ac Switch to alertmanager api v2
According to the 2.25 release notes, 2.26 should switch to alertmanager
api v2 by default.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-20 01:01:10 +01:00