* Use DRBG based RSA key generation everywhere
* switch to the conditional generator
* Use DRBG based RSA key generation everywhere
* switch to the conditional generator
* Add an ENV var to disable the DRBG in a pinch
* update go.mod
* Use DRBG based RSA key generation everywhere
* switch to the conditional generator
* Add an ENV var to disable the DRBG in a pinch
* Use DRBG based RSA key generation everywhere
* update go.mod
* fix import
* Remove rsa2 alias, remove test code
* move cryptoutil/rsa.go to sdk
* move imports too
* remove makefile change
* rsa2->rsa
* more rsa2->rsa, remove test code
* fix some overzelous search/replace
* Update to a real tag
* changelog
* copyright
* work around copyright check
* work around copyright check pt2
* bunch of dupe imports
* missing import
* wrong license
* fix go.mod conflict
* missed a spot
* dupe import
- I have a suspicion the for loop with the timer can be infinite loops
in certain circumstances. Instead leverage the normal test helpers
for fetching tidy status
* Track the last PKI auto-tidy time ran for use across nodes
- If the interval time for auto-tidy is longer then say a regularly
scheduled restart of Vault, auto-tidy is never run. This is due to
the time of the last run of tidy is only kept in memory and
initialized on startup to the current time
- Store the last run of any tidy, to maintain previous behavior, to
a cluster local file, which is read in/initialized upon a mount
initialization.
* Add auto-tidy configuration fields for backing off at startup
* Add new auto-tidy fields to UI
* Update api docs for auto-tidy
* Add cl
* Update field description text
* Apply Claire's suggestions from code review
Co-authored-by: claire bontempo <68122737+hellobontempo@users.noreply.github.com>
* Implementing PR feedback from the UI team
* remove explicit defaults and types so we retrieve from backend, decouple enabling auto tidy from duration, move params to auto settings section
---------
Co-authored-by: claire bontempo <68122737+hellobontempo@users.noreply.github.com>
Co-authored-by: claire bontempo <cbontempo@hashicorp.com>
* add gosimport to make fmt and run it
* move installation to tools.sh
* correct weird spacing issue
* Update Makefile
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* fix a weird issue
---------
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
* Adding explicit MPL license for sub-package.
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
* Adding explicit MPL license for sub-package.
This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.
* Updating the license from MPL to Business Source License.
Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at https://hashi.co/bsl-blog, FAQ at www.hashicorp.com/licensing-faq, and details of the license at www.hashicorp.com/bsl.
* add missing license headers
* Update copyright file headers to BUS-1.1
* Fix test that expected exact offset on hcl file
---------
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
Co-authored-by: Sarah Thompson <sthompson@hashicorp.com>
Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
* Fix ACME tidy to not reference acmeCtx
acmeContext is useful for when we need to reference things with a ACME
base URL, but everything used in tidy doesn't need this URL as it is not
coming from an ACME request.
Refactor tidy to remove references to acmeContext, including dependent
functions in acme_state.go.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Remove spurious log message
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Draft Tidy Acme Test with Backdate Storage + Backdate Sysxsx
* Fixes to ACME tidy testing
Co-authored-by: kitography <khaines@mit.edu>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Correctly set account kid to update account status
Co-authored-by: kitography <khaines@mit.edu>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add TestTidyAcmeWithSafetyBuffer
Co-authored-by: kitography <khaines@mit.edu>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add test for disabling tidy operation
Co-authored-by: kitography <khaines@mit.edu>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add acme_account_safety_buffer to auto-tidy config
Resolve: #21872
Co-authored-by: kitography <khaines@mit.edu>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests verifying tidy safety buffers
Co-authored-by: kitography <khaines@mit.edu>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add account status validations and order cleanup tests
---------
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Co-authored-by: kitography <khaines@mit.edu>
Co-authored-by: Steve Clark <steven.clark@hashicorp.com>
- Add a little more debug to try and understand how we could get the
following error I saw within GHA
```
path_tidy_test.go:769: Created root and leaf certificate, deleted leaf, but a got a certificate count of 2
```
- Fix some other issues discovered while running the test through
stress locally
- There's a race within the Plugin reloading mechanism that isn't
trivial to address. To silence some of the failures, switch this
test to use sealing of the cores instead of the plugin reload
mechanism
This should help to prevent the issue of missing tidy configurations
in the future, by placing all related configuration options at the
top with common validation logic.
However, short from this approach is ensuring that each config option
can be specified independently. Thus, the test allows (for any added
and properly tracked tidy operations) verifying that we have enabled
proper storage/retention of that attribute.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Structure of ACME Tidy.
* The tidy endpoints/call information.
* Counts for status plumbing.
* Update typo calls, add information saving date of account creation.
* Missed some field locations.
* Set-up of Tidy command.
* Proper tidy function; lock to work with
* Remove order safety buffer.
* Missed a field.
* Read lock for account creation; Write lock for tidy (account deletion)
* Type issues fixed.
* fix range operator.
* Fix path_tidy read.
* Add fields to auto-tidy config.
* Add (and standardize) Tidy Config Response
* Test pass, consistent fields
* Changes from PR-Reviews.
* Update test to updated default due to PR-Review.
* Telemetry Metrics Configuration.
* Err Shadowing Fix (woah, semgrep is cool).
* Fix TestBackend_RevokePlusTidy_Intermediate
* Add Changelog.
* Fix memory leak. Code cleanup as suggested by Steve.
* Turn off metrics by default, breaking-change.
* Show on tidy-status before start-up.
* Fix tests
* make fmt
* Add emit metrics to periodicFunc
* Test not delivering unavailable metrics + fix.
* Better error message.
* Fixing the false-error bug.
* make fmt.
* Try to fix race issue, remove confusing comments.
* Switch metric counter variables to an atomic.Uint32
- Switch the metric counter variables to an atomic variable type
so that we are forced to properly load/store values to it
* Fix race-issue better by trying until the metric is sunk.
* make fmt.
* empty commit to retrigger non-race tests that all pass locally
---------
Co-authored-by: Steve Clark <steven.clark@hashicorp.com>
* Add global, cross-cluster revocation queue to PKI
This adds a global, cross-cluster replicated revocation queue, allowing
operators to revoke certificates by serial number across any cluster. We
don't support revoking with private key (PoP) in the initial
implementation.
In particular, building on the PBPWF work, we add a special storage
location for handling non-local revocations which gets replicated up to
the active, primary cluster node and back down to all secondary PR
clusters. These then check the pending revocation entry and revoke the
serial locally if it exists, writing a cross-cluster confirmation entry.
Listing capabilities are present under pki/certs/revocation-queue,
allowing operators to see which certs are present. However, a future
improvement to the tidy subsystem will allow automatic cleanup of stale
entries.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow tidying revocation queue entries
No manual operator control of revocation queue entries are allowed.
However, entries are stored with their request time, allowing tidy to,
after a suitable safety buffer, remove these unconfirmed and presumably
invalid requests.
Notably, when a cluster goes offline, it will be unable to process
cross-cluster revocations for certificates it holds. If tidy runs,
potentially valid revocations may be removed. However, it is up to the
administrator to ensure the tidy window is sufficiently long that any
required maintenance is done (or, prior to maintenance when an issue is
first noticed, tidy is temporarily disabled).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Only allow enabling global revocation queue on Vault Enterprise
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use a locking queue to handle revocation requests
This queue attempts to guarantee that PKI's invalidateFunc won't have
to wait long to execute: by locking only around access to the queue
proper, and internally using a list, we minimize the time spent locked,
waiting for queue accesses.
Previously, we held a lock during tidy and processing that would've
prevented us from processing invalidateFunc calls.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* use_global_queue->cross_cluster_revocation
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Grab revocation storage lock when processing queue
We need to grab the storage lock as we'll actively be revoking new
certificates in the revocation queue. This ensures nobody else is
competing for storage access, across periodic funcs, new revocations,
and tidy operations.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix expected tidy status test
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow probing RollbackManager directly in tests
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Address review feedback on revocationQueue
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add more cancel checks, fix starting manual tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Export CreateBackendWithStorage for pkiext
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Move zlint_test.go to pkiext
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Fix mount all test to ignore pkiext
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add automatic tidy of expired issuers
To aid PKI users like Consul, which periodically rotate intermediates,
and provided a little more consistency with older versions of Vault
which would silently (and dangerously!) replace the configured CA on
root/intermediate generation, we introduce an automatic tidy of expired
issuers.
This includes a longer safety buffer (1 year) and logging of the
relevant issuer information prior to deletion (certificate contents, key
ID, and issuer ID/name) to allow admins to recover this value if
desired, or perform further cleanup of keys.
From my PoV, removal of the issuer is thus a relatively safe operation
compared to keys (which I do not feel comfortable removing) as they can
always be re-imported if desired. Additionally, this is an opt-in tidy
operation, not enabled by default. Lastly, most major performance
penalties comes with lots of issuers within the mount, not as much
large numbers of keys (as only new issuer creation/import operations are
affected, unlike LIST /issuers which is a public, unauthenticated
endpoint).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add test for tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add docs on tidy of issuers
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Restructure logging
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing fields to expected tidy output
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Return revocation info within existing certs/<serial> api
- The api already returned both the certificate and a revocation_time
field populated. Update the api to return revocation_time_rfc3339
as we do elsewhere and also the issuer id if it was revoked.
- This will allow callers to associate a revoked cert with an issuer
* Add cl
* PR feedback (docs update)
* Allow tidy operations to be cancelled
When tidy operations take a long time to execute (and especially when
executing them automatically), having the ability to cancel them becomes
useful to reduce strain on Vault clusters (and let them be rescheduled
at a later time).
To this end, we add the /tidy-cancel write endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add missing auto-tidy synopsis / description
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add a pause duration between tidying certificates
By setting pause_duration, operators can have a little control over the
resource utilization of a tidy operation. While the list of certificates
remain in memory throughout the entire operation, a pause is added
between processing certificates and the revocation lock is released.
This allows other operations to occur during this gap and potentially
allows the tidy operation to consume less resources per unit of time
(due to the sleep -- though obviously consumes the same resources over
the time of the operation).
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for cancellation, pause
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add API docs on pause_duration, /tidy-cancel
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add lock releasing around tidy pause
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Reset cancel guard, return errors
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ability to perform automatic tidy operations
This enables the PKI secrets engine to allow tidy to be started
periodically by the engine itself, avoiding the need for interaction.
This operation is disabled by default (to avoid load on clusters which
don't need tidy to be run) but can be enabled.
In particular, a default tidy configuration is written (via
/config/auto-tidy) which mirrors the options passed to /tidy. Two
additional parameters, enabled and interval, are accepted, allowing
auto-tidy to be enabled or disabled and controlling the interval
(between successful tidy runs) to attempt auto-tidy.
Notably, a manual execution of tidy will delay additional auto-tidy
operations. Status is reported via the existing /tidy-status endpoint.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add changelog entry
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add documentation on auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add tests for auto-tidy
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Prevent race during parallel testing
We modified the RollbackManager's execution window to allow more
faithful testing of the periodicFunc. However, the TestAutoRebuild and
the new TestAutoTidy would then race against each other for modifying
the period and creating their clusters (before resetting to the old
value).
This changeset adds a lock around this, preventing the races.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Use tidyStatusLock to gate lastTidy time
This prevents a data race between the periodic func and the execution of
the running tidy.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add read lock around tidyStatus gauges
When reading from tidyStatus for computing gauges, since the underlying
values aren't atomics, we really should be gating these with a read lock
around the status access.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>