* add gosimport to make fmt and run it
* move installation to tools.sh
* correct weird spacing issue
* Update Makefile
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* fix a weird issue
---------
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* Make plugin-specific env take precedence over sys env
* Expand the existing plugin env integration test
---------
Co-authored-by: Austin Gebauer <34121980+austingebauer@users.noreply.github.com>
This commit introduces two new adaptive concurrency limiters in Vault,
which should handle overloading of the server during periods of
untenable request rate. The limiter adjusts the number of allowable
in-flight requests based on latency measurements performed across the
request duration. This approach allows us to reject entire requests
prior to doing any work and prevents clients from exceeding server
capacity.
The limiters intentionally target two separate vectors that have been
proven to lead to server over-utilization.
- Back pressure from the storage backend, resulting in bufferbloat in
the WAL system. (enterprise)
- Back pressure from CPU over-utilization via PKI issue requests
(specifically for RSA keys), resulting in failed heartbeats.
Storage constraints can be accounted for by limiting logical requests
according to their http.Method. We only limit requests with write-based
methods, since these will result in storage Puts and exhibit the
aforementioned bufferbloat.
CPU constraints are accounted for using the same underlying library and
technique; however, they require special treatment. The maximum number
of concurrent pki/issue requests found in testing (again, specifically
for RSA keys) is far lower than the minimum tolerable write request
rate. Without separate limiting, we would artificially impose limits on
tolerable request rates for non-PKI requests. To specifically target PKI
issue requests, we add a new PathsSpecial field, called limited,
allowing backends to specify a list of paths which should get
special-case request limiting.
For the sake of code cleanliness and future extensibility, we introduce
the concept of a LimiterRegistry. The registry proposed in this PR has
two entries, corresponding with the two vectors above. Each Limiter
entry has its own corresponding maximum and minimum concurrency,
allowing them to react to latency deviation independently and handle
high volumes of requests to targeted bottlenecks (CPU and storage).
In both cases, utilization will be effectively throttled before Vault
reaches any degraded state. The resulting 503 - Service Unavailable is a
retryable HTTP response code, which can be handled to gracefully retry
and eventually succeed. Clients should handle this by retrying with
jitter and exponential backoff. This is done within Vault's API, using
the go-retryablehttp library.
Limiter testing was performed via benchmarks of mixed workloads and
across a deployment of agent pods with great success.
We're on a quest to reduce our pipeline execution time to both enhance
our developer productivity but also to reduce the overall cost of the CI
pipeline. The strategy we use here reduces workflow execution time and
network I/O cost by reducing our module cache size and using binary
external tools when possible. We no longer download modules and build
many of the external tools thousands of times a day.
Our previous process of installing internal and external developer tools
was scattered and inconsistent. Some tools were installed via `go
generate -tags tools ./tools/...`,
others via various `make` targets, and some only in Github Actions
workflows. This process led to some undesirable side effects:
* The modules of some dev and test tools were included with those
of the Vault project. This leads to us having to manage our own
Go modules with those of external tools. Prior to Go 1.16 this
was the recommended way to handle external tools, but now
`go install tool@version` is the recommended way to handle
external tools that need to be build from source as it supports
specific versions but does not modify the go.mod.
* Due to Github cache constraints we combine our build and test Go
module caches together, but having our developer tools as deps in
our module results in a larger cache which is downloaded on every
build and test workflow runner. Removing the external tools that were
included in our go.mod reduced the expanded module cache by size
by ~300MB, thus saving time and network I/O costs when downloading
the module cache.
* Not all of our developer tools were included in our modules. Some were
being installed with `go install` or `go run`, so they didn't take
advantage of a single module cache. This resulted in us downloading
Go modules on every CI and Build runner in order to build our
external tools.
* Building our developer tools from source in CI is slow. Where possible
we can prefer to use pre-built binaries in CI workflows. No more
module download or tool compiles if we can avoid them.
I've refactored how we define internal and external build tools
in our Makefile and added several new targets to handle both building
the developer tools locally for development and verifying that they are
available. This allows for an easy developer bootstrap while also
supporting installation of many of the external developer tools from
pre-build binaries in CI. This reduces our network IO and run time
across nearly all of our actions runners.
While working on this I caught and resolved a few unrelated issue:
* Both our Go and Proto format checks we're being run incorrectly. In
CI they we're writing changes but not failing if changes were
detected. The Go was less of a problem as we have git hooks that
are intended to enforce formatting, however we drifted over time.
* Our Git hooks couldn't handle removing a Go file without failing. I
moved the diff check into the new Go helper and updated it to handle
removing files.
* I combined a few separate scripts and into helpers and added a few
new capabilities.
* I refactored how we install Go modules to make it easier to download
and tidy all of the projects go.mod's.
* Refactor our internal and external tool installation and verification
into a tools.sh helper.
* Combined more complex Go verification into `scripts/go-helper.sh` and
utilize it in the `Makefile` and git commit hooks.
* Add `Makefile` targets for executing our various tools.sh helpers.
* Update our existing `make` targets to use new tool targets.
* Normalize our various scripts and targets output to have a consistent
output format.
* In CI, install many of our external dependencies as binaries wherever
possible. When not possible we'll build them from scratch but not mess
with the shared module cache.
* [QT-641] Remove our external build tools from our project Go modules.
* [QT-641] Remove extraneous `go list`'s from our `set-up-to` composite
action.
* Fix formatting and regen our protos
Signed-off-by: Ryan Cragun <me@ryan.ec>
* wip
* more pruning
* Integrate OCSP into binary paths PoC
- Simplify some of the changes to the router
- Remove the binary test PKI endpoint
- Switch OCSP to use the new binary paths backend variable
* Fix proto generation and test compilation
* Add unit test for binary request handling
---------
Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
Biggest change: we rename `Send` to `SendEvent` in `logical.EventSender`..
Initially we picked `Send` to match the underlying go-eventlogger
broker's `Send` method, and to avoid the stuttering of `events.SendEvent`.
However, I think it is more useful for the `logical.EventSender`
interface to use the method `SendEvent` so that, for example,
`framework.Backend` can implement it.
This is a relatively change now that should not affect anything
except the KV plugin, which is being fixed in another PR.
Another change: if the `secret_path` metadata is present, then
the plugin-aware `EventBus` will prepend it with the plugin mount.
This allows the `secret_path` to be the full path to any referenced
secret.
This change is also backwards compatible, since this field was not
present in the KV plugin. (It did use the slightly different `path`
field, which we can keep for now.)
* Migrate protobuf generation to Buf
Buf simplifies the generation story and allows us to lean
into other features in the Buf ecosystem, such as dependency
management, linting, breaking change detection, formatting
and remote plugins.
* Format all protobuf files with buf
Also add a CI job to ensure formatting remains consistent
* Add CI job to warn on proto generate diffs
Some files were not regenerated with the latest version
of the protobuf binary. This CI job will ensure we are always
detect if the protobuf files need regenerating.
* Add CI job for linting protobuf files
* CreateOperation should only be implemented alongside ExistenceCheck
Closes#12329
Vault treats all POST or PUT HTTP requests equally - they default to
being treated as UpdateOperations, but, if a backend implements an
ExistenceCheck function, CreateOperations can be separated out when the
existence check returns false.
It follows, then, that if a CreateOperation handler is implemented
without an ExistenceCheck function, this is unreachable code - a coding
error. It's a fairly minor error in the grand scheme of things, but it
causes the generated OpenAPI spec to include x-vault-createSupported for
operations on which create can never actually be invoked - and promotes
muddled understanding of the create/update feature.
In this PR:
1) Implement a new test, which checks all builtin auth methods and
secrets engines can be successfully initialized. (This is important
to validate the next part.)
2) Expand upon the existing coding error checks built in to
framework.Backend, adding a check for this misuse of CreateOperation.
3) Fix up instances of improper CreateOperation within the Vault
repository - just two, transit and mock.
Note: At this point, the newly added test will **fail**.
There are improper uses of CreateOperation in all of the following:
vault-plugin-auth-cf
vault-plugin-auth-kerberos
vault-plugin-auth-kubernetes
vault-plugin-secrets-ad
vault-plugin-secrets-gcpkms
vault-plugin-secrets-kubernetes
vault-plugin-secrets-kv
vault-plugin-secrets-openldap
vault-plugin-secrets-terraform
each of which needs to be fixed and updated in go.mod here, before this
new check can be added.
* Add subtests
* Add in testing of KV v2, which otherwise doesn't get tested
This is a surprisingly complicated special case
* The database plugin needs special handling as well, and add in help invocations of the builtin backends too
* Fix extra package prefix
* Add changelog
* Update 6 out of 9 plugins to needed new versions
Note, this IS an upgrade despite the apparent version numbers going
down. (That's a consequence of slightly odd release management occurring
in the plugin repositories.)
* Update to deal with code changes since branch originally created
* Perform necessary update of vault-plugin-secrets-kubernetes so that CI checks on PR can run
* Fix another instance of incorrect CreateOperation, for a test-only endpoint
By being hidden behind a Go build constraint, it had evaded notice until
now.
* Add an opportunistic test of sys/internal/specs/openapi too
This isn't perfect for sure, but it's solidifying and becoming a useful
base to work off.
This routes events sent from auth and secrets plugins to the main
`EventBus` in the Vault Core. Events sent from plugins are automatically
tagged with the namespace and plugin information associated with them.
* Add WriteForwardedStorage to sdk's plugin, logical in OSS
This should allow backends to specify paths to forward write
(storage.Put(...) and storage.Delete(...)) operations for.
Notably, these semantics are subject to change and shouldn't yet be
relied on.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Collect paths for write forwarding in OSS
This adds a path manager to Core, allowing tracking across all Vault
versions of paths which could use write forwarding if available. In
particular, even on OSS offerings, we'll need to template {{clusterId}}
into the paths, in the event of later upgrading to Enterprise. If we
didn't, we'd end up writing paths which will no longer be accessible
post-migration, due to write forwarding now replacing the sentinel with
the actual cluster identifier.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add forwarded writer implementation to OSS
Here, for paths given to us, we determine if we need to do cluster
translation and perform local writing. This is the OSS variant.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Wire up mount-specific request forwarding in OSS
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Clarify that state lock needs to be held to call HAState in OSS
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Move cluster sentinel constant to sdk/logical
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Expose ClusterID to Plugins via SystemView
This will let plugins learn what the Cluster's ID is, without having to
resort to hacks like writing a random string to its cluster-prefixed
namespace and then reading it once it has replicated.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add GRPC ClusterID implementation
For any external plugins which wish to use it.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* add Link config, init, and capabilities
* add node status proto
* bump protoc version to 3.21.9
* make proto
* adding link tests
* remove wrapped link
* add changelog entry
* update changelog entry
Move version out of SDK. For now it's a copy rather than move: the part not addressed by this change is sdk/helper/useragent.String, which we'll want to remove in favour of PluginString. That will have to wait until we've removed uses of useragent.String from all builtins.
And return an error instead of panicking.
This situation can occur if a plugin attempts to access the system
view during setup when Vault is checking the plugin metadata.
Fixes#17878.
Change the multiplexing key to use all `PluginRunner` config (converted to a struct which is comparable), so that plugins with the same name but different env, args, types, versions etc are not incorrectly multiplexed together.
Co-authored-by: Christopher Swenson <christopher.swenson@hashicorp.com>
Add plugin version to GRPC interface
Added a version interface in the sdk/logical so that it can be shared between all plugin types, and then wired it up to RunningVersion in the mounts, auth list, and database systems.
I've tested that this works with auth, database, and secrets plugin types, with the following logic to populate RunningVersion:
If a plugin has a PluginVersion() method implemented, then that is used
If not, and the plugin is built into the Vault binary, then the go.mod version is used
Otherwise, the it will be the empty string.
My apologies for the length of this PR.
* Placeholder backend should be external
We use a placeholder backend (previously a framework.Backend) before a
GRPC plugin is lazy-loaded. This makes us later think the plugin is a
builtin plugin.
So we added a `placeholderBackend` type that overrides the
`IsExternal()` method so that later we know that the plugin is external,
and don't give it a default builtin version.
* Support version selection for database plugins
* Don't consider unversioned plugins for version selection algorithm
* Added version to 'plugin not found' error
* Add PluginFactoryVersion function to avoid changing sdk/ API
* enable registering backend muxed plugins in plugin catalog
* set the sysview on the pluginconfig to allow enabling secrets/auth plugins
* store backend instances in map
* store single implementations in the instances map
cleanup instance map and ensure we don't deadlock
* fix system backend unit tests
move GetMultiplexIDFromContext to pluginutil package
fix pluginutil test
fix dbplugin ut
* return error(s) if we can't get the plugin client
update comments
* refactor/move GetMultiplexIDFromContext test
* add changelog
* remove unnecessary field on pluginClient
* add unit tests to PluginCatalog for secrets/auth plugins
* fix comment
* return pluginClient from TestRunTestPlugin
* add multiplexed backend test
* honor metadatamode value in newbackend pluginconfig
* check that connection exists on cleanup
* add automtls to secrets/auth plugins
* don't remove apiclientmeta parsing
* use formatting directive for fmt.Errorf
* fix ut: remove tls provider func
* remove tlsproviderfunc from backend plugin tests
* use env var to prevent test plugin from running as a unit test
* WIP: remove lazy loading
* move non lazy loaded backend to new package
* use version wrapper for backend plugin factory
* remove backendVersionWrapper type
* implement getBackendPluginType for plugin catalog
* handle backend plugin v4 registration
* add plugin automtls env guard
* modify plugin factory to determine the backend to use
* remove old pluginsets from v5 and log pid in plugin catalog
* add reload mechanism via context
* readd v3 and v4 to pluginset
* call cleanup from reload if non-muxed
* move v5 backend code to new package
* use context reload for for ErrPluginShutdown case
* add wrapper on v5 backend
* fix run config UTs
* fix unit tests
- use v4/v5 mapping for plugin versions
- fix test build err
- add reload method on fakePluginClient
- add multiplexed cases for integration tests
* remove comment and update AutoMTLS field in test
* remove comment
* remove errwrap and unused context
* only support metadatamode false for v5 backend plugins
* update plugin catalog errors
* use const for env variables
* rename locks and remove unused
* remove unneeded nil check
* improvements based on staticcheck recommendations
* use const for single implementation string
* use const for context key
* use info default log level
* move pid to pluginClient struct
* remove v3 and v4 from multiplexed plugin set
* return from reload when non-multiplexed
* update automtls env string
* combine getBackend and getBrokeredClient
* update comments for plugin reload, Backend return val and log
* revert Backend return type
* allow non-muxed plugins to serve v5
* move v5 code to existing sdk plugin package
* do next export sdk fields now that we have removed extra plugin pkg
* set TLSProvider in ServeMultiplex for backwards compat
* use bool to flag multiplexing support on grpc backend server
* revert userpass main.go
* refactor plugin sdk
- update comments
- make use of multiplexing boolean and single implementation ID const
* update comment and use multierr
* attempt v4 if dispense fails on getPluginTypeForUnknown
* update comments on sdk plugin backend
* use automtls for v5 secrets/auth plugins
* add automtls env guard
* start backend without metadata mode
* use PluginClientConfig for backend's NewPluginClient param
refactor
* - fix pluginutil test
- do not expect plugin to be unloaded in UT
- fix pluginutil tests --need new env var
- use require in UT
- fix lazy load test
* add changelog
* prioritize automtls; improve comments
* user multierror; refactor pluginSet for v4 unit test
* add test cases for v4 and v5 plugin versions
* remove unnecessary call to AutoMTLSSupported
* update comment on pluginSets
* use runconfig directly in sdk newpluginclient
* use automtls without metadatamode for v5 backend plugin registration
* use multierror for plugin runconfig calls
* remove some unnecessary code
* VAULT-6613 add DetermineRoleFromLoginRequest function to Core
* Fix body handling
* Role resolution for rate limit quotas
* VAULT-6613 update precedence test
* Add changelog
* VAULT-6614 start of changes for roles in LCQs
* Expiration changes for leases
* Add role information to RequestAuth
* VAULT-6614 Test updates
* VAULT-6614 Add expiration test with roles
* VAULT-6614 fix comment
* VAULT-6614 Protobuf on OSS
* VAULT-6614 Add rlock to determine role code
* VAULT-6614 Try lock instead of rlock
* VAULT-6614 back to rlock while I think about this more
* VAULT-6614 Additional safety for nil dereference
* VAULT-6614 Use %q over %s
* VAULT-6614 Add overloading to plugin backends
* VAULT-6614 RLocks instead
* VAULT-6614 Fix return for backend factory