prometheus/prometheus.md at bwplotka/a2-tsdb - prometheus - gitea@git.xfx1.de

mirrors/prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2025-12-04 00:51:02 +01:00

Łukasz Mierzwa 8a1086a128

feat: Add flag that blocks lvl 1 compactions until upload is confirmed in an external JSON file (#17435 )

* Delay compactions until Thanos uploads all blocks

Using Thanos sidecar with Prometheus requires us to disable TSDB compactions on Prometheus side by setting --storage.tsdb.min-block-duration and --storage.tsdb.max-block-duration to the same value. See https://thanos.io/tip/components/sidecar.md. The main problem this avoids is that Prometheus might compact given block before Thanos uploads it, creating a gap in Thanos metrics. Thanos does not upload compacted blocks because that would upload the same sample multiple times. You can tell Thanos to upload compacted blocks but that is aimed at one time migrations. This patch creates a bridge between Thanos and Prometheus by allowing Prometheus to read the shipper file Thanos creates, where it tracks which blocks were already uploaded, and using that data delays compaction of blocks until they are marked as uploaded by Thanos. Thanks to this both services can coordinate with each other (in a way) and we can stop disabling compaction on Prometheus side when Thanos uploads are enabled.

The reason to have this is that disabling compactions have very dramatic performance cost. Since most time series exist for longer than a single block duration (2h by default) large chunks of block index will reference the same series, so 10 * 2h blocks will each have an index that is usually fairly big and is almost the same for all 10 blocks. Compaction de-duplicates the index so merging 10 blocks together would leave us with a single index that is around the same size as each of these 10 2h blocks would have (plus some extra for series that only exists in some blocks, but not all). Every range query that iterates over all 10 blocks would then have to read each index and so we're doing 10x more work then if we had a single compacted block.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Rename structs and functions to make this more generic

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Address review comments

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

* Cache UploadMeta for 1 minute

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

---------

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>

2025-12-02 10:39:45 +00:00

10 KiB

Raw Permalink Blame History

title

title
prometheus

The Prometheus monitoring server

Flags

Flag	Description	Default
`-h`, `--help`	Show context-sensitive help (also try --help-long and --help-man).
`--version`	Show application version.
`--config.file`	Prometheus configuration file path.	`prometheus.yml`
`--config.auto-reload-interval`	Specifies the interval for checking and automatically reloading the Prometheus configuration file upon detecting changes.	`30s`
`--web.listen-address` `...`	Address to listen on for UI, API, and telemetry. Can be repeated.	`0.0.0.0:9090`
`--auto-gomaxprocs`	Automatically set GOMAXPROCS to match Linux container CPU quota	`true`
`--auto-gomemlimit`	Automatically set GOMEMLIMIT to match Linux container or system memory limit	`true`
`--auto-gomemlimit.ratio`	The ratio of reserved GOMEMLIMIT memory to the detected maximum container or system memory	`0.9`
`--web.config.file`	[EXPERIMENTAL] Path to configuration file that can enable TLS or authentication.
`--web.read-timeout`	Maximum duration before timing out read of the request, and closing idle connections.	`5m`
`--web.max-connections`	Maximum number of simultaneous connections across all listeners.	`512`
`--web.max-notifications-subscribers`	Limits the maximum number of subscribers that can concurrently receive live notifications. If the limit is reached, new subscription requests will be denied until existing connections close.	`16`
`--web.external-url`	The URL under which Prometheus is externally reachable (for example, if Prometheus is served via a reverse proxy). Used for generating relative and absolute links back to Prometheus itself. If the URL has a path portion, it will be used to prefix all HTTP endpoints served by Prometheus. If omitted, relevant URL components will be derived automatically.
`--web.route-prefix`	Prefix for the internal routes of web endpoints. Defaults to path of --web.external-url.
`--web.user-assets`	Path to static asset directory, available at /user.
`--web.enable-lifecycle`	Enable shutdown and reload via HTTP request.	`false`
`--web.enable-admin-api`	Enable API endpoints for admin control actions.	`false`
`--web.enable-remote-write-receiver`	Enable API endpoint accepting remote write requests.	`false`
`--web.remote-write-receiver.accepted-protobuf-messages`	List of the remote write protobuf messages to accept when receiving the remote writes. Supported values: prometheus.WriteRequest, io.prometheus.write.v2.Request	`prometheus.WriteRequest`
`--web.enable-otlp-receiver`	Enable API endpoint accepting OTLP write requests.	`false`
`--web.console.templates`	Path to the console template directory, available at /consoles.	`consoles`
`--web.console.libraries`	Path to the console library directory.	`console_libraries`
`--web.page-title`	Document title of Prometheus instance.	`Prometheus Time Series Collection and Processing Server`
`--web.cors.origin`	Regex for CORS origin. It is fully anchored. Example: 'https?://(domain1\|domain2).com'	`.*`
`--storage.tsdb.path`	Base path for metrics storage. Use with server mode only.	`data/`
`--storage.tsdb.retention.time`	[DEPRECATED] How long to retain samples in storage. If neither this flag nor "storage.tsdb.retention.size" is set, the retention time defaults to 15d. Units Supported: y, w, d, h, m, s, ms. This flag has been deprecated, use the storage.tsdb.retention.time field in the config file instead. Use with server mode only.
`--storage.tsdb.retention.size`	[DEPRECATED] Maximum number of bytes that can be stored for blocks. A unit is required, supported units: B, KB, MB, GB, TB, PB, EB. Ex: "512MB". Based on powers-of-2, so 1KB is 1024B. This flag has been deprecated, use the storage.tsdb.retention.size field in the config file instead. Use with server mode only.
`--storage.tsdb.no-lockfile`	Do not create lockfile in data directory. Use with server mode only.	`false`
`--storage.tsdb.head-chunks-write-queue-size`	Size of the queue through which head chunks are written to the disk to be m-mapped, 0 disables the queue completely. Experimental. Use with server mode only.	`0`
`--storage.tsdb.delay-compact-file.path`	Path to a JSON file with uploaded TSDB blocks e.g. Thanos shipper meta file. If set TSDB will only compact 1 level blocks that are marked as uploaded in that file, improving external storage integrations e.g. with Thanos sidecar. 1+ level compactions won't be delayed. Use with server mode only.
`--storage.agent.path`	Base path for metrics storage. Use with agent mode only.	`data-agent/`
`--storage.agent.wal-compression`	Compress the agent WAL. If false, the --storage.agent.wal-compression-type flag is ignored. Use with agent mode only.	`true`
`--storage.agent.retention.min-time`	Minimum age samples may be before being considered for deletion when the WAL is truncated Use with agent mode only.
`--storage.agent.retention.max-time`	Maximum age samples may be before being forcibly deleted when the WAL is truncated Use with agent mode only.
`--storage.agent.no-lockfile`	Do not create lockfile in data directory. Use with agent mode only.	`false`
`--storage.remote.flush-deadline`	How long to wait flushing sample on shutdown or config reload.	`1m`
`--storage.remote.read-sample-limit`	Maximum overall number of samples to return via the remote read interface, in a single query. 0 means no limit. This limit is ignored for streamed response types. Use with server mode only.	`5e7`
`--storage.remote.read-concurrent-limit`	Maximum number of concurrent remote read calls. 0 means no limit. Use with server mode only.	`10`
`--storage.remote.read-max-bytes-in-frame`	Maximum number of bytes in a single frame for streaming remote read response types before marshalling. Note that client might have limit on frame size as well. 1MB as recommended by protobuf by default. Use with server mode only.	`1048576`
`--rules.alert.for-outage-tolerance`	Max time to tolerate prometheus outage for restoring "for" state of alert. Use with server mode only.	`1h`
`--rules.alert.for-grace-period`	Minimum duration between alert and restored "for" state. This is maintained only for alerts with configured "for" time greater than grace period. Use with server mode only.	`10m`
`--rules.alert.resend-delay`	Minimum amount of time to wait before resending an alert to Alertmanager. Use with server mode only.	`1m`
`--rules.max-concurrent-evals`	Global concurrency limit for independent rules that can run concurrently. When set, "query.max-concurrency" may need to be adjusted accordingly. Use with server mode only.	`4`
`--alertmanager.notification-queue-capacity`	The capacity of the queue for pending Alertmanager notifications. Use with server mode only.	`10000`
`--alertmanager.notification-batch-size`	The maximum number of notifications per batch to send to the Alertmanager. Use with server mode only.	`256`
`--alertmanager.drain-notification-queue-on-shutdown`	Send any outstanding Alertmanager notifications when shutting down. If false, any outstanding Alertmanager notifications will be dropped when shutting down. Use with server mode only.	`true`
`--query.lookback-delta`	The maximum lookback duration for retrieving metrics during expression evaluations and federation. Use with server mode only.	`5m`
`--query.timeout`	Maximum time a query may take before being aborted. Use with server mode only.	`2m`
`--query.max-concurrency`	Maximum number of queries executed concurrently. Use with server mode only.	`20`
`--query.max-samples`	Maximum number of samples a single query can load into memory. Note that queries will fail if they try to load more samples than this into memory, so this also limits the number of samples a query can return. Use with server mode only.	`50000000`
`--enable-feature` `...`	Comma separated feature names to enable. Valid options: exemplar-storage, expand-external-labels, memory-snapshot-on-shutdown, promql-per-step-stats, promql-experimental-functions, extra-scrape-metrics, auto-gomaxprocs, created-timestamp-zero-ingestion, concurrent-rule-eval, delayed-compaction, old-ui, otlp-deltatocumulative, promql-duration-expr, use-uncached-io, promql-extended-range-selectors. See https://prometheus.io/docs/prometheus/latest/feature_flags/ for more details.
`--agent`	Run Prometheus in 'Agent mode'.
`--log.level`	Only log messages with the given severity or above. One of: [debug, info, warn, error]	`info`
`--log.format`	Output format of log messages. One of: [logfmt, json]	`logfmt`