This PR introduces a new testonly endpoint for introspecting the
RequestLimiter state. It makes use of the endpoint to verify that changes to
the request_limiter config are honored across reload.
In the future, we may choose to make the sys/internal/request-limiter/status
endpoint available in normal binaries, but this is an expedient way to expose
the status for testing without having to rush the design.
In order to re-use as much of the existing command package utility funcionality
as possible without introducing sprawling code changes, I introduced a new
server_util.go and exported some fields via accessors.
The tests shook out a couple of bugs (including a deadlock and lack of
locking around the core limiterRegistry state).
This commit introduces two new adaptive concurrency limiters in Vault,
which should handle overloading of the server during periods of
untenable request rate. The limiter adjusts the number of allowable
in-flight requests based on latency measurements performed across the
request duration. This approach allows us to reject entire requests
prior to doing any work and prevents clients from exceeding server
capacity.
The limiters intentionally target two separate vectors that have been
proven to lead to server over-utilization.
- Back pressure from the storage backend, resulting in bufferbloat in
the WAL system. (enterprise)
- Back pressure from CPU over-utilization via PKI issue requests
(specifically for RSA keys), resulting in failed heartbeats.
Storage constraints can be accounted for by limiting logical requests
according to their http.Method. We only limit requests with write-based
methods, since these will result in storage Puts and exhibit the
aforementioned bufferbloat.
CPU constraints are accounted for using the same underlying library and
technique; however, they require special treatment. The maximum number
of concurrent pki/issue requests found in testing (again, specifically
for RSA keys) is far lower than the minimum tolerable write request
rate. Without separate limiting, we would artificially impose limits on
tolerable request rates for non-PKI requests. To specifically target PKI
issue requests, we add a new PathsSpecial field, called limited,
allowing backends to specify a list of paths which should get
special-case request limiting.
For the sake of code cleanliness and future extensibility, we introduce
the concept of a LimiterRegistry. The registry proposed in this PR has
two entries, corresponding with the two vectors above. Each Limiter
entry has its own corresponding maximum and minimum concurrency,
allowing them to react to latency deviation independently and handle
high volumes of requests to targeted bottlenecks (CPU and storage).
In both cases, utilization will be effectively throttled before Vault
reaches any degraded state. The resulting 503 - Service Unavailable is a
retryable HTTP response code, which can be handled to gracefully retry
and eventually succeed. Clients should handle this by retrying with
jitter and exponential backoff. This is done within Vault's API, using
the go-retryablehttp library.
Limiter testing was performed via benchmarks of mixed workloads and
across a deployment of agent pods with great success.
Adds the ability to pin a version for a specific plugin type + name to enable an easier plugin upgrade UX. After pinning and reloading, that version should be the only version in use.
No HTTP API implementation yet for managing pins, so no user-facing effects yet.
- As TestInteg_KMIP_Audit showed, the x509.Certificate's
big.Int SerialNumber is mangled when we do a deep clone
of the LogInput's Request TLSConnection object.
- As the tls.ConnectionState does not have a Clone itself and
we don't modify this field, it should be safe to just grab
the existing reference into the cloned version.
* audit: entry_formatter update to ensure no race detection issues
* in progress with looking at a clone method for LogInput
* Tidy up LogInput Clone method
* less memory allocation
* fix hmac key clone
We're on a quest to reduce our pipeline execution time to both enhance
our developer productivity but also to reduce the overall cost of the CI
pipeline. The strategy we use here reduces workflow execution time and
network I/O cost by reducing our module cache size and using binary
external tools when possible. We no longer download modules and build
many of the external tools thousands of times a day.
Our previous process of installing internal and external developer tools
was scattered and inconsistent. Some tools were installed via `go
generate -tags tools ./tools/...`,
others via various `make` targets, and some only in Github Actions
workflows. This process led to some undesirable side effects:
* The modules of some dev and test tools were included with those
of the Vault project. This leads to us having to manage our own
Go modules with those of external tools. Prior to Go 1.16 this
was the recommended way to handle external tools, but now
`go install tool@version` is the recommended way to handle
external tools that need to be build from source as it supports
specific versions but does not modify the go.mod.
* Due to Github cache constraints we combine our build and test Go
module caches together, but having our developer tools as deps in
our module results in a larger cache which is downloaded on every
build and test workflow runner. Removing the external tools that were
included in our go.mod reduced the expanded module cache by size
by ~300MB, thus saving time and network I/O costs when downloading
the module cache.
* Not all of our developer tools were included in our modules. Some were
being installed with `go install` or `go run`, so they didn't take
advantage of a single module cache. This resulted in us downloading
Go modules on every CI and Build runner in order to build our
external tools.
* Building our developer tools from source in CI is slow. Where possible
we can prefer to use pre-built binaries in CI workflows. No more
module download or tool compiles if we can avoid them.
I've refactored how we define internal and external build tools
in our Makefile and added several new targets to handle both building
the developer tools locally for development and verifying that they are
available. This allows for an easy developer bootstrap while also
supporting installation of many of the external developer tools from
pre-build binaries in CI. This reduces our network IO and run time
across nearly all of our actions runners.
While working on this I caught and resolved a few unrelated issue:
* Both our Go and Proto format checks we're being run incorrectly. In
CI they we're writing changes but not failing if changes were
detected. The Go was less of a problem as we have git hooks that
are intended to enforce formatting, however we drifted over time.
* Our Git hooks couldn't handle removing a Go file without failing. I
moved the diff check into the new Go helper and updated it to handle
removing files.
* I combined a few separate scripts and into helpers and added a few
new capabilities.
* I refactored how we install Go modules to make it easier to download
and tidy all of the projects go.mod's.
* Refactor our internal and external tool installation and verification
into a tools.sh helper.
* Combined more complex Go verification into `scripts/go-helper.sh` and
utilize it in the `Makefile` and git commit hooks.
* Add `Makefile` targets for executing our various tools.sh helpers.
* Update our existing `make` targets to use new tool targets.
* Normalize our various scripts and targets output to have a consistent
output format.
* In CI, install many of our external dependencies as binaries wherever
possible. When not possible we'll build them from scratch but not mess
with the shared module cache.
* [QT-641] Remove our external build tools from our project Go modules.
* [QT-641] Remove extraneous `go list`'s from our `set-up-to` composite
action.
* Fix formatting and regen our protos
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-22481: Audit filter node (#24465)
* Initial commit on adding filter nodes for audit
* tests for audit filter
* test: longer filter - more conditions
* copywrite headers
* Check interface for the right type
* Add audit filtering feature (#24554)
* Support filter nodes in backend factories and add some tests
* More tests and cleanup
* Attempt to move control of registration for nodes and pipelines to the audit broker (#24505)
* invert control of the pipelines/nodes to the audit broker vs. within each backend
* update noop audit test code to implement the pipeliner interface
* noop mount path has trailing slash
* attempting to make NoopAudit more friendly
* NoopAudit uses known salt
* Refactor audit.ProcessManual to support filter nodes
* HasFiltering
* rename the pipeliner
* use exported AuditEvent in Filter
* Add tests for registering and deregistering backends on the audit broker
* Add missing licence header to one file, fix a typo in two tests
---------
Co-authored-by: Peter Wilson <peter.wilson@hashicorp.com>
* Add changelog file
* update bexpr datum to use a strong type
* go docs updates
* test path
* PR review comments
* handle scenarios/outcomes from broker.send
* don't need to re-check the complete sinks
* add extra check to deregister to ensure that re-registering non-filtered device sets sink threshold
* Ensure that the multierror is appended before attempting to return it
---------
Co-authored-by: Peter Wilson <peter.wilson@hashicorp.com>
* Implement custom-message management endpoints in a namespace aware manner
* completion of non-enterprise version of custom-messages
* clean up of error handling and fixing a nil pointer error
* rename UICustomMessagesEntry to UICustomMessageEntry
* add unit tests to cover new functions in UIConfig related to custom messages
* unit tests for all custom message handling
* add missing header comments for new files
* add changelog file
* fix test setup error that led to unexpected failure
* change return type from slice of pointers to struct to slice of struct and add godocs to every function
* add Internal suffix to internal methods for the UIConfig struct
* add validation for start and end times of custom messages
* improvements based on review feedback
* explore new approach for custom messages
* introduce new error to force HTTP 404 when referencing non-existant UI custom message
* remove changelog entry until feature is complete
* implement CRUD endpoints using single storage entry per namespace
* add mutex to protect operations that read the storage entry and write it back
* add copyright header comment to new files
* fix failing tests due to change in target function behaviour in order to return 404 error when mandated
* feedback from review plus some improvements on my own as well
* define constants for recognized message types and replace hardcoded strings occurrences with new constants
* incorporate feedback comment
* beef up testing with non-root namespaces in putEntry and getEntryForNamespace
* renaming CreateMessage to AddMessage in uicustommessages.Manager and uicustommessages.Entry
* adding missing copyright header comments
* wip
* Work on the tuneable allowance and some bugs
* Call handleCancellableRequest instead, which gets the audit order more correct and includes the preauth response
* Get rid of no longer needed operation
* Phew, this wasn't necessary
* Add auth error handling by the backend, and fix a bug with handleInvalidCredentials
* Cleanup req/resp naming
* Use the new form, and data
* Discovered that tokens werent really being checked because isLoginRequest returns true for the re-request into the backend, when it shouldnt
* Add a few more checks in the delegated request handler for bad inputs
- Protect the delegated handler from bad inputs from the backend such
as an empty accessor, a path that isn't registered as a login request
- Add similar protections for bad auth results as we do in the normal
login request paths. Technically not 100% needed but if somehow the
handleCancelableRequest doesn't use the handleLoginRequest code path
we could get into trouble in the future
- Add delegated-auth-accessors flag to the secrets tune command and
api-docs
* Unit tests and some small fixes
* Remove transit preauth test, rely on unit tests
* Cleanup and add a little more commentary in tests
* Fix typos, add another failure use-case which we reference a disabled auth mount
* PR Feedback
- Use router to lookup mount instead of defining a new lookup method
- Enforce auth table types and namespace when mount is found
- Define a type alias for the handleInvalidCreds
- Fix typos/grammar
- Clean up globals in test
* Additional PR feedback
- Add test for delegated auth handler
- Force batch token usage
- Add a test to validate failures if a non-batch token is used
- Check for Data member being nil in test cases
* Update failure error message around requiring batch tokens
* Trap MFA requests
* Reword some error messages
* Add test and fixes for delegated response wrapping
* Move MFA test to dedicated mount
- If the delegated auth tests were running in parallel, the MFA test
case might influence the other tests, so move the MFA to a dedicated
mount
* PR feedback: use textproto.CanonicalMIMEHeaderKey
- Change the X-Vault-Wrap-Ttl constant to X-Vault-Wrap-TTL
and use textproto.CanonicalMIMEHeaderKey to format it
within the delete call.
- This protects the code around changes of the constant typing
* PR feedback
- Append Error to RequestDelegatedAuth
- Force error interface impl through explicit nil var assignment on
RequestDelegatedAuthError
- Clean up test factory and leverage NewTestSoloCluster
- Leverage newer maps.Clone as this is 1.16 only
---------
Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
* Re-implementation of API redirects with more deterministic matching
* add missing file
* Handle query params properly
* licensing
* Add single src deregister
* Implement specifically RFC 5785 (.well-known) redirects.
Also implement a unit test for HA setups, making sure the standby node redirects to the active (as usual), and that then the active redirects the .well-known request to a backend, and that that is subsequently satisfied.
* Remove test code
* Rename well known redirect logic
* comments/cleanup
* PR feedback
* Remove wip typo
* Update http/handler.go
Co-authored-by: Steven Clark <steven.clark@hashicorp.com>
* Fix registrations with trailing slashes
---------
Co-authored-by: Steven Clark <steven.clark@hashicorp.com>
* wip
* more pruning
* Integrate OCSP into binary paths PoC
- Simplify some of the changes to the router
- Remove the binary test PKI endpoint
- Switch OCSP to use the new binary paths backend variable
* Fix proto generation and test compilation
* Add unit test for binary request handling
---------
Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
* create custom type for disable-replication-status-endpoints context key
make use of custom context key type in middleware function
* clean up code to remove various compiler warnings
unnecessary return statement
if condition that is always true
fix use of deprecated ioutil.NopCloser
empty if block
* remove unused unexported function
* clean up code
remove unnecessary nil check around a range expression
* clean up code
removed redundant return statement
* use http.StatusTemporaryRedirect constant instead of literal integer
* create custom type for context key for max_request_size parameter
* create custom type for context key for original request path
Subscribing to events through a WebSocket now support boolean
expressions to filter only the events wanted based on the fields
* `event_type`
* `operation`
* `source_plugin_mount`
* `data_path`
* `namespace`
Example expressions:
These can be passed to `vault events subscribe`, e.g.,:
* `event_type == abc`
* `source_plugin_mount == secret/`
* `event_type != def and operation != write`
```sh
vault events subscribe -filter='source_plugin_mount == secret/' 'kv*'
```
The docs for the `vault events subscribe` command and API endpoint
will be coming shortly in a different PR, and will include a better
specification for these expressions, similar to (or linking to)
https://developer.hashicorp.com/boundary/docs/concepts/filtering
* reduce calls to DetermineRoleFromLoginRequest from 3 to 1 for aws auth method
* change ordering of LoginCreateToken args
* replace another determineRoleFromLoginRequest function with role from context
* add changelog
* Check for role in context if not there make call to DeteremineRoleFromLoginRequest
* move context role check below nanmespace check
* Update changelog/22583.txt
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* revert signature to same order
* make sure resp is last argument
* retrieve role from context closer to where role variable is needed
* remove failsafe for role in mfa login
* Update changelog/22583.txt
---------
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
Biggest change: we rename `Send` to `SendEvent` in `logical.EventSender`..
Initially we picked `Send` to match the underlying go-eventlogger
broker's `Send` method, and to avoid the stuttering of `events.SendEvent`.
However, I think it is more useful for the `logical.EventSender`
interface to use the method `SendEvent` so that, for example,
`framework.Backend` can implement it.
This is a relatively change now that should not affect anything
except the KV plugin, which is being fixed in another PR.
Another change: if the `secret_path` metadata is present, then
the plugin-aware `EventBus` will prepend it with the plugin mount.
This allows the `secret_path` to be the full path to any referenced
secret.
This change is also backwards compatible, since this field was not
present in the KV plugin. (It did use the slightly different `path`
field, which we can keep for now.)
* Migrate protobuf generation to Buf
Buf simplifies the generation story and allows us to lean
into other features in the Buf ecosystem, such as dependency
management, linting, breaking change detection, formatting
and remote plugins.
* Format all protobuf files with buf
Also add a CI job to ensure formatting remains consistent
* Add CI job to warn on proto generate diffs
Some files were not regenerated with the latest version
of the protobuf binary. This CI job will ensure we are always
detect if the protobuf files need regenerating.
* Add CI job for linting protobuf files
* add APILockShouldBlockRequest to backend proto
* make proto
* add APILockShouldBlockRequest to system view
* Revert "make proto"
This reverts commit 7f33733f185e8a7419590d82150e85abdcc5e707.
* Revert "add APILockShouldBlockRequest to backend proto"
This reverts commit a3bf41f7f2a0811dd323fbff4da45a582c942f2b.
* move APILockShouldBlockRequest to extended sys view
* add changelog entry
* Add stub ACME billing interfaces
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add initial implementation of client count
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Correctly attribute to mount, namespace
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor adding entities of custom types
This begins to add custom types of events; presently these are counted
as non-entity tokens, but prefixed with a custom ClientID prefix.
In the future, this will be the basis for counting these events
separately (into separate buckets and separate storage segments).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor creation of ACME mounts
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add test case for billing
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Better support managed key system view casting
Without an additional parameter, SystemView could be of a different
internal implementation type that cannot be directly casted to in OSS.
Use a separate parameter for the managed key system view to use instead.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor creation of mounts for enterprise
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Validate mounts in ACME billing tests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use a hopefully unique separator for encoded identifiers
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use mount accesor, not path
Co-authored-by: miagilepner <mia.epner@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Rename AddEventToFragment->AddActivityToFragment
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Co-authored-by: miagilepner <mia.epner@hashicorp.com>
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
* Add header operation to sdk/logical
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add support for routing HEAD operations
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
The [WebSockets spec](https://www.rfc-editor.org/rfc/rfc6455) states
that text messages must be valid UTF-8 encoded strings, which protobuf
messages virtually never are. This now correctly sends the protobuf events
as binary messages.
We change the format to correspond to CloudEvents, as originally intended,
and remove a redundant timestamp and newline.
We also bump the eventlogger to fix a race condition that this code triggers.
Also updates the event receieved to include a timestamp.
Websockets support both JSON and protobuf binary formats.
This can be used by either `wscat` or the new
`vault events subscribe`:
e.g.,
```sh
$ wscat -H "X-Vault-Token: $(vault print token)" --connect ws://127.0.0.1:8200/v1/sys/events/subscribe/abc?json=true
{"event":{"id":"5c5c8c83-bf43-7da5-fe88-fc3cac814b2e", "note":"testing"}, "eventType":"abc", "timestamp":"2023-02-07T18:40:50.598408Z"}
...
```
and
```sh
$ vault events subscribe abc
{"event":{"id":"5c5c8c83-bf43-7da5-fe88-fc3cac814b2e", "note":"testing"}, "eventType":"abc", "timestamp":"2023-02-07T18:40:50.598408Z"}
...
```
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
This isn't perfect for sure, but it's solidifying and becoming a useful
base to work off.
This routes events sent from auth and secrets plugins to the main
`EventBus` in the Vault Core. Events sent from plugins are automatically
tagged with the namespace and plugin information associated with them.
* Adds managed key usages for MAC generate/verify and RNG.
* Remove MAC-related key usages from managed key in favor of sign/verify.
* Remove context from random source managed key interface.
* named MFA method configurations
* fix a test
* CL
* fix an issue with same config name different ID and add a test
* feedback
* feedback on test
* consistent use of passcode for all MFA methods (#18611)
* make use of passcode factor consistent for all MFA types
* improved type for MFA factors
* add method name to login CLI
* minor refactoring
* only accept MFA method name with its namespace path in the login request MFA header
* fix a bug
* fixing an ErrorOrNil return value
* more informative error message
* Apply suggestions from code review
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* feedback
* test refactor a bit
* adding godoc for a test
* feedback
* remove sanitize method name
* guard a possbile nil ref
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* Add WriteForwardedStorage to sdk's plugin, logical in OSS
This should allow backends to specify paths to forward write
(storage.Put(...) and storage.Delete(...)) operations for.
Notably, these semantics are subject to change and shouldn't yet be
relied on.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Collect paths for write forwarding in OSS
This adds a path manager to Core, allowing tracking across all Vault
versions of paths which could use write forwarding if available. In
particular, even on OSS offerings, we'll need to template {{clusterId}}
into the paths, in the event of later upgrading to Enterprise. If we
didn't, we'd end up writing paths which will no longer be accessible
post-migration, due to write forwarding now replacing the sentinel with
the actual cluster identifier.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add forwarded writer implementation to OSS
Here, for paths given to us, we determine if we need to do cluster
translation and perform local writing. This is the OSS variant.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Wire up mount-specific request forwarding in OSS
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Clarify that state lock needs to be held to call HAState in OSS
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Move cluster sentinel constant to sdk/logical
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Expose ClusterID to Plugins via SystemView
This will let plugins learn what the Cluster's ID is, without having to
resort to hacks like writing a random string to its cluster-prefixed
namespace and then reading it once it has replicated.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add GRPC ClusterID implementation
For any external plugins which wish to use it.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>