This PR introduces a new testonly endpoint for introspecting the
RequestLimiter state. It makes use of the endpoint to verify that changes to
the request_limiter config are honored across reload.
In the future, we may choose to make the sys/internal/request-limiter/status
endpoint available in normal binaries, but this is an expedient way to expose
the status for testing without having to rush the design.
In order to re-use as much of the existing command package utility funcionality
as possible without introducing sprawling code changes, I introduced a new
server_util.go and exported some fields via accessors.
The tests shook out a couple of bugs (including a deadlock and lack of
locking around the core limiterRegistry state).
This commit introduces two new adaptive concurrency limiters in Vault,
which should handle overloading of the server during periods of
untenable request rate. The limiter adjusts the number of allowable
in-flight requests based on latency measurements performed across the
request duration. This approach allows us to reject entire requests
prior to doing any work and prevents clients from exceeding server
capacity.
The limiters intentionally target two separate vectors that have been
proven to lead to server over-utilization.
- Back pressure from the storage backend, resulting in bufferbloat in
the WAL system. (enterprise)
- Back pressure from CPU over-utilization via PKI issue requests
(specifically for RSA keys), resulting in failed heartbeats.
Storage constraints can be accounted for by limiting logical requests
according to their http.Method. We only limit requests with write-based
methods, since these will result in storage Puts and exhibit the
aforementioned bufferbloat.
CPU constraints are accounted for using the same underlying library and
technique; however, they require special treatment. The maximum number
of concurrent pki/issue requests found in testing (again, specifically
for RSA keys) is far lower than the minimum tolerable write request
rate. Without separate limiting, we would artificially impose limits on
tolerable request rates for non-PKI requests. To specifically target PKI
issue requests, we add a new PathsSpecial field, called limited,
allowing backends to specify a list of paths which should get
special-case request limiting.
For the sake of code cleanliness and future extensibility, we introduce
the concept of a LimiterRegistry. The registry proposed in this PR has
two entries, corresponding with the two vectors above. Each Limiter
entry has its own corresponding maximum and minimum concurrency,
allowing them to react to latency deviation independently and handle
high volumes of requests to targeted bottlenecks (CPU and storage).
In both cases, utilization will be effectively throttled before Vault
reaches any degraded state. The resulting 503 - Service Unavailable is a
retryable HTTP response code, which can be handled to gracefully retry
and eventually succeed. Clients should handle this by retrying with
jitter and exponential backoff. This is done within Vault's API, using
the go-retryablehttp library.
Limiter testing was performed via benchmarks of mixed workloads and
across a deployment of agent pods with great success.
* wip
* Work on the tuneable allowance and some bugs
* Call handleCancellableRequest instead, which gets the audit order more correct and includes the preauth response
* Get rid of no longer needed operation
* Phew, this wasn't necessary
* Add auth error handling by the backend, and fix a bug with handleInvalidCredentials
* Cleanup req/resp naming
* Use the new form, and data
* Discovered that tokens werent really being checked because isLoginRequest returns true for the re-request into the backend, when it shouldnt
* Add a few more checks in the delegated request handler for bad inputs
- Protect the delegated handler from bad inputs from the backend such
as an empty accessor, a path that isn't registered as a login request
- Add similar protections for bad auth results as we do in the normal
login request paths. Technically not 100% needed but if somehow the
handleCancelableRequest doesn't use the handleLoginRequest code path
we could get into trouble in the future
- Add delegated-auth-accessors flag to the secrets tune command and
api-docs
* Unit tests and some small fixes
* Remove transit preauth test, rely on unit tests
* Cleanup and add a little more commentary in tests
* Fix typos, add another failure use-case which we reference a disabled auth mount
* PR Feedback
- Use router to lookup mount instead of defining a new lookup method
- Enforce auth table types and namespace when mount is found
- Define a type alias for the handleInvalidCreds
- Fix typos/grammar
- Clean up globals in test
* Additional PR feedback
- Add test for delegated auth handler
- Force batch token usage
- Add a test to validate failures if a non-batch token is used
- Check for Data member being nil in test cases
* Update failure error message around requiring batch tokens
* Trap MFA requests
* Reword some error messages
* Add test and fixes for delegated response wrapping
* Move MFA test to dedicated mount
- If the delegated auth tests were running in parallel, the MFA test
case might influence the other tests, so move the MFA to a dedicated
mount
* PR feedback: use textproto.CanonicalMIMEHeaderKey
- Change the X-Vault-Wrap-Ttl constant to X-Vault-Wrap-TTL
and use textproto.CanonicalMIMEHeaderKey to format it
within the delete call.
- This protects the code around changes of the constant typing
* PR feedback
- Append Error to RequestDelegatedAuth
- Force error interface impl through explicit nil var assignment on
RequestDelegatedAuthError
- Clean up test factory and leverage NewTestSoloCluster
- Leverage newer maps.Clone as this is 1.16 only
---------
Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
* create custom type for disable-replication-status-endpoints context key
make use of custom context key type in middleware function
* clean up code to remove various compiler warnings
unnecessary return statement
if condition that is always true
fix use of deprecated ioutil.NopCloser
empty if block
* remove unused unexported function
* clean up code
remove unnecessary nil check around a range expression
* clean up code
removed redundant return statement
* use http.StatusTemporaryRedirect constant instead of literal integer
* create custom type for context key for max_request_size parameter
* create custom type for context key for original request path
* reduce calls to DetermineRoleFromLoginRequest from 3 to 1 for aws auth method
* change ordering of LoginCreateToken args
* replace another determineRoleFromLoginRequest function with role from context
* add changelog
* Check for role in context if not there make call to DeteremineRoleFromLoginRequest
* move context role check below nanmespace check
* Update changelog/22583.txt
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* revert signature to same order
* make sure resp is last argument
* retrieve role from context closer to where role variable is needed
* remove failsafe for role in mfa login
* Update changelog/22583.txt
---------
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* Add header operation to sdk/logical
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add support for routing HEAD operations
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* port SSCT OSS
* port header hmac key to ent and generate token proto without make command
* remove extra nil check in request handling
* add changelog
* add comment to router.go
* change test var to use length constants
* remove local index is 0 check and extra defer which can be removed after use of ExternalID
* VAULT-1564 report in-flight requests
* adding a changelog
* Changing some variable names and fixing comments
* minor style change
* adding unauthenticated support for in-flight-req
* adding documentation for the listener.profiling stanza
* adding an atomic counter for the inflight requests
addressing comments
* addressing comments
* logging completed requests
* fixing a test
* providing log_requests_info as a config option to determine at which level requests should be logged
* removing a member and a method from the StatusHeaderResponseWriter struct
* adding api docks
* revert changes in NewHTTPResponseWriter
* Fix logging invalid log_requests_info value
* Addressing comments
* Fixing a test
* use an tomic value for logRequestsInfo, and moving the CreateClientID function to Core
* fixing go.sum
* minor refactoring
* protecting InFlightRequests from data race
* another try on fixing a data race
* another try to fix a data race
* addressing comments
* fixing couple of tests
* changing log_requests_info to log_requests_level
* minor style change
* fixing a test
* removing the lock in InFlightRequests
* use single-argument form for interface assertion
* adding doc for the new configuration paramter
* adding the new doc to the nav data file
* minor fix
* Unify NewHTTPResponseWriter ant NewStatusHeaderResponseWriter to fix ResponseWriter issues
* adding changelog
* removing unnecessary function from the WrappingResponseWriter interface
* changing logical requests responseWriter type
* reverting change to HTTPResponseWriter
* fix adding clientID to request in audit log
* fix boolean statement
* use standard encoding for client ID instead of urlEncoding
* change encoding in tests
* add in client counts to request handling
* remove redundant client ID generation in request handling
* directly add clientID to req after handling token usage
* handle HTTP PATCH requests as logical.PatchOperation
* update go.mod, go.sum
* a nil response for logical.PatchOperation should result in 404
* respond with 415 for incorrect MIME type in PATCH Content-Type header
* add abstraction to handle PatchOperation requests
* add ACLs for patch
* Adding JSON Merge support to the API client
* add HTTP PATCH tests to check high level response logic
* add permission-based 'kv patch' tests in prep to add HTTP PATCH
* adding more 'kv patch' CLI command tests
* fix TestHandler_Patch_NotFound
* Fix TestKvPatchCommand_StdinValue
* add audit log test for HTTP PATCH
* patch CLI changes
* add patch CLI tests
* change JSONMergePatch func to accept a ctx
* fix TestKVPatchCommand_RWMethodNotExists and TestKVPatchCommand_RWMethodSucceeds to specify -method flag
* go fmt
* add a test to verify patching works by default with the root token
* add changelog entry
* get vault-plugin-secrets-kv@add-patch-support
* PR feedback
* reorder some imports; go fmt
* add doc comment for HandlePatchOperation
* add json-patch@v5.5.0 to go.mod
* remove unnecessary cancelFunc for WriteBytes
* remove default for -method
* use stable version of json-patch; go mod tidy
* more PR feedback
* temp go get vault-plugin-secrets-kv@master until official release
Co-authored-by: Josh Black <raskchanky@users.noreply.github.com>
This allows logical operations (along with a non-nil response writer) to
process http handler funcs within the operation function while keeping
auth and audit checks that the logical request flow provides.
* upgrade aws roles
* test upgrade aws roles
* Initialize aws credential backend at mount time
* add a TODO
* create end-to-end test for builtin/credential/aws
* fix bug in initializer
* improve comments
* add Initialize() to logical.Backend
* use Initialize() in Core.enableCredentialInternal()
* use InitializeRequest to call Initialize()
* improve unit testing for framework.Backend
* call logical.Backend.Initialize() from all of the places that it needs to be called.
* implement backend.proto changes for logical.Backend.Initialize()
* persist current role storage version when upgrading aws roles
* format comments correctly
* improve comments
* use postUnseal funcs to initialize backends
* simplify test suite
* improve test suite
* simplify logic in aws role upgrade
* simplify aws credential initialization logic
* simplify logic in aws role upgrade
* use the core's activeContext for initialization
* refactor builtin/plugin/Backend
* use a goroutine to upgrade the aws roles
* misc improvements and cleanup
* do not run AWS role upgrade on DR Secondary
* always call logical.Backend.Initialize() when loading a plugin.
* improve comments
* on standbys and DR secondaries we do not want to run any kind of upgrade logic
* fix awsVersion struct
* clarify aws version upgrade
* make the upgrade logic for aws auth more explicit
* aws upgrade is now called from a switch
* fix fallthrough bug
* simplify logic
* simplify logic
* rename things
* introduce currentAwsVersion const to track aws version
* improve comments
* rearrange things once more
* conglomerate things into one function
* stub out aws auth initialize e2e test
* improve aws auth initialize e2e test
* finish aws auth initialize e2e test
* tinker with aws auth initialize e2e test
* tinker with aws auth initialize e2e test
* tinker with aws auth initialize e2e test
* fix typo in test suite
* simplify logic a tad
* rearrange assignment
* Fix a few lifecycle related issues in #7025 (#7075)
* Fix panic when plugin fails to load
* Work on raft backend
* Add logstore locally
* Add encryptor and unsealable interfaces
* Add clustering support to raft
* Remove client and handler
* Bootstrap raft on init
* Cleanup raft logic a bit
* More raft work
* Work on TLS config
* More work on bootstrapping
* Fix build
* More work on bootstrapping
* More bootstrapping work
* fix build
* Remove consul dep
* Fix build
* merged oss/master into raft-storage
* Work on bootstrapping
* Get bootstrapping to work
* Clean up FMS and node-id
* Update local node ID logic
* Cleanup node-id change
* Work on snapshotting
* Raft: Add remove peer API (#906)
* Add remove peer API
* Add some comments
* Fix existing snapshotting (#909)
* Raft get peers API (#912)
* Read raft configuration
* address review feedback
* Use the Leadership Transfer API to step-down the active node (#918)
* Raft join and unseal using Shamir keys (#917)
* Raft join using shamir
* Store AEAD instead of master key
* Split the raft join process to answer the challenge after a successful unseal
* get the follower to standby state
* Make unseal work
* minor changes
* Some input checks
* reuse the shamir seal access instead of new default seal access
* refactor joinRaftSendAnswer function
* Synchronously send answer in auto-unseal case
* Address review feedback
* Raft snapshots (#910)
* Fix existing snapshotting
* implement the noop snapshotting
* Add comments and switch log libraries
* add some snapshot tests
* add snapshot test file
* add TODO
* More work on raft snapshotting
* progress on the ConfigStore strategy
* Don't use two buckets
* Update the snapshot store logic to hide the file logic
* Add more backend tests
* Cleanup code a bit
* [WIP] Raft recovery (#938)
* Add recovery functionality
* remove fmt.Printfs
* Fix a few fsm bugs
* Add max size value for raft backend (#942)
* Add max size value for raft backend
* Include physical.ErrValueTooLarge in the message
* Raft snapshot Take/Restore API (#926)
* Inital work on raft snapshot APIs
* Always redirect snapshot install/download requests
* More work on the snapshot APIs
* Cleanup code a bit
* On restore handle special cases
* Use the seal to encrypt the sha sum file
* Add sealer mechanism and fix some bugs
* Call restore while state lock is held
* Send restore cb trigger through raft log
* Make error messages nicer
* Add test helpers
* Add snapshot test
* Add shamir unseal test
* Add more raft snapshot API tests
* Fix locking
* Change working to initalize
* Add underlying raw object to test cluster core
* Move leaderUUID to core
* Add raft TLS rotation logic (#950)
* Add TLS rotation logic
* Cleanup logic a bit
* Add/Remove from follower state on add/remove peer
* add comments
* Update more comments
* Update request_forwarding_service.proto
* Make sure we populate all nodes in the followerstate obj
* Update times
* Apply review feedback
* Add more raft config setting (#947)
* Add performance config setting
* Add more config options and fix tests
* Test Raft Recovery (#944)
* Test raft recovery
* Leave out a node during recovery
* remove unused struct
* Update physical/raft/snapshot_test.go
* Update physical/raft/snapshot_test.go
* fix vendoring
* Switch to new raft interface
* Remove unused files
* Switch a gogo -> proto instance
* Remove unneeded vault dep in go.sum
* Update helper/testhelpers/testhelpers.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* Update vault/cluster/cluster.go
* track active key within the keyring itself (#6915)
* track active key within the keyring itself
* lookup and store using the active key ID
* update docstring
* minor refactor
* Small text fixes (#6912)
* Update physical/raft/raft.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* review feedback
* Move raft logical system into separate file
* Update help text a bit
* Enforce cluster addr is set and use it for raft bootstrapping
* Fix tests
* fix http test panic
* Pull in latest raft-snapshot library
* Add comment