vault/website/content/docs/concepts/request-limiter/index.mdx

---
layout: docs
page_title: 'Request Limiter'
description: >-
  Vault provides an adaptive concurrency limiter to protect the Vault server
  from overload.
---

# Request Limiter

@include 'alerts/enterprise-only.mdx'

<Warning title="Beta (Deprecated)">

The request limiter was released in Vault 1.16 as a Beta
feature. During Beta evaluation we found an alternative approach better met
the needs of our users. This feature will be removed from Vault in a future
release. It is replaced with [adaptive overload protection](/vault/docs/concepts/adaptive-overload-protection).

</Warning>

This document contains conceptual information about the **Request Limiter** and
its user-facing effects.

## Preventing overload

The Request Limiter aims to prevent overload by proactively detecting latency
deviation from a baseline and adapting the number of allowed in-flight requests.

This is done in two phases at the beginning of an HTTP request:

1. Consult the current number of allowed in-flight requests. If the new request
would exceed this limit, immediately reject it, indicating that the client
should retry later.

2. If the request is allowed, begin a measurement of its latency, allowing the
Request Limiter to calculate a new limit.

## Resource constraints

The Request Limiter intentionally focuses on preventing overload derived from
resource-constrained operations on the Vault server. Vault focuses on two
specific types of resource constraints which commonly cause issues in production
workloads:

1. Write latency in the storage backend, resulting in a growing queue of updates
to be flushed. These writes originate primarily from `Write`-based HTTP methods.

2. CPU utilization caused by computationally expensive PKI issue requests
(generally for RSA keys). Large numbers of these requests can consume all CPU
resources, preventing timely processing of other requests such as heartbeats and
health checks.

Storage constraints can be accounted for by limiting logical requests according
to their `http.Method`. We only measure and limit requests with `Write`-based
HTTP methods. Read requests do not generally cause storage updates, meaning that
their latencies are unlikely to be correlated with storage constraints.

CPU constraints are accounted for using the same underlying library and
technique; however, they require special treatment. The maximum number of
concurrent pki/issue requests found in testing (again, specifically for RSA
keys) is far lower than the minimum tolerable write request rate.

In both cases, utilization will be effectively throttled before Vault reaches
any degraded state. The resulting `503 - Service Unavailable` is a retryable
HTTP response code, which can be handled to gracefully retry and eventually
succeed.  Clients should handle this by retrying with jitter and exponential
backoff. This is done within Vault's API `Client` implementation, using the
go-retryablehttp library.

## Read requests

HTTP methods such as `GET` and `LIST` are not subject to write request
limiting.  This allows operators to continue querying server state without
needing to retry.

## Vault server overloaded

When Vault has reached capacity, new requests will be immediately rejected with a
retryable `503 - Service Unavailable`
[error](/vault/docs/concepts/adaptive-overload-protection/vault-server-temporarily-overloaded).