mirror of
https://github.com/hashicorp/vault.git
synced 2025-11-08 04:11:39 +01:00
The code that loads the trusted certificate cache for cert-based
authentication ignores any error that occurs while attempting to load
any of the certificates that it finds. Undoubtedly some deployments
have broken certificates or other non-certificate files stored in
their respective back-ends, and so this is important behavior: we
don't want to fail authentication just because `README.md` is not a
valid certificate!
In addition, because listing files and loading certificates is
expensive, the server maintains a cache of trusted certificates. This
cache is populated the first time it's needed, and then used for the
lifetime of the process. If a file fails to load as a certificate,
then it is simply not included in the cache.
These two things lead to a problem when using a backend that might be
subject to transient failures: a hiccough in the certificate loading
process can cause the server to establish a cache that is missing an
otherwise valid certificate. This can then lead to clients failing to
authenticate to the server, until such time as the server is restarted
and the cache reloaded.
This change makes the certificate cache more resilient to loading
failures, by caching partial successes. With this patch, the cache
behavior becomes:
- If the cache exists *and* is either complete or it is not yet time
to attempt to reload the certificates, then the cached results are
used without reservation.
- Otherwise we attempt to load the certificates from storage:
- If the cache does not already exist then a new, empty cache is
created.
- The storage is listed, we attempt to load everything in storage,
skipping things that we have already successfully loaded, and
skipping things that we cannot load, as usual.
- Once we have attempted to load everything from storage, if there
were any errors, we compute a deadline for retrying the load, with
an exponentially increasing delay. If there were no errors, then
the cache is considered complete, and there will be no retry.
This has the nice behavior that we recover from transient failures
eventually, while the exponential back-off ensures that we don't waste
too much time attempting to load certificates that can never be
loaded.
Co-authored-by: John Doty <john.doty@databricks.com>
Co-authored-by: Steven Clark <steven.clark@hashicorp.com>
4 lines
142 B
Plaintext
4 lines
142 B
Plaintext
```release-note:bug
|
|
auth/cert: Recover from partially populated caches of trusted certificates if one or more certificates fails to load.
|
|
```
|