* actions(install-tools): include os and arch in cache key
When caching and/or restoring our tools we should include the os and
arch in the key to ensure that we don't accidentally download the wrong
tools on different runners.
We also update the nightlies to specifically cache arm64 before running
the tests.
* actionlint: add arm self-hosted runner keys
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* actions: use self-hosted runners in hashicorp/vault
While it is recommended that we use self-hosted runners for every
workflow in private and internal accounts, this change was primarily
motivated by different runner types using different cache paths. By
using the same runner type everywhere we can avoid double caches of the
internal Vault tools.
* disable the terraform wrapper in ci-bootstrap to handle updated action
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* [VAULT-39671] tools: use github cache for external tools
We currently have some ~13 tools that we need available both locally for
development and in CI for building, linting, and formatting, and testing Vault.
Each branch that we maintain often uses the same set of tools but often pinned
to different versions.
For development, we have a `make tools` target that will execute the
`tools/tool.sh` installation script for the various tools at the correct pin.
This works well enough but is cumbersome if you’re working across many branches
that have divergent versions.
For CI the problem is speed and repetition. For each build job (~10) and Go test
job (16-52) we have to install most of the same tools for each job. As we have
extremely limited Github Actions cache we can’t afford to cache the entire vault
go build cache, so if we were to build them from source each time we incur a
penalty of downloading all of the modules and building each tool from source.
This yields about an extra 2 minutes per job to install all of the tools. We’ve
worked around this problem by writing composite actions that download pre-built
binaries of the same tools instead of building them from source. That usually
takes a few seconds. The downside of that approach is rate limiting, which
Github has become much more aggressive in enforcing.
That leads us to where we are before this work:
- For builds in the compatibility docker container: the tools are built from
source and cached as separate builder image layer. (usually fast as we get
cache hits, slow on cache misses)
- For builds that compile directly on the runner: the tools are installed on
each job runner by composite github actions (fast, uses API requests, prone
to throttling)
- For tests, they use the same composite actions to install the tools on each
job. (fast, uses API requests, prone to throttling)
This also leads to inconsistencies since there are two sources of truth: the
composite actions have their own version pin outside of those in `tools.sh`.
This has led to drift.
We previously tried to save some API requests and move all builds into
the container. That almost works but docker's build conatiner had a hard
time with some esoteric builds. We could special case it but it's a bandaid at
best.
A prior version of this work (VAULT-39654) investigated using `go tool`, but
there were some showstopper issues with that workflow that make it a non-starter
for us. Instead, we’ll attempt to use more actions cache to resolve the
throttling. This will allow us to have a single source of truth for tools, their
pins, and afford us the same speed on cache hits as we had previously without
downloading the tools from github releases thousands of times per day.
We add a new composite github action for installing our tools.
- On cache misses it builds the tools and installs them into a cacheable path.
- On cache hits it restore the cacheable path.
- It adds the tools to the GITHUB_PATH to ensure runner based jobs can find
them.
- For Docker builds it mounts the tools at `/opt/tools/bin` which is
part of the PATH in the container.
- It uses a cache key of the SHA of the tools directory along with the
working directory SHA which is required to deal with actions/cache
issues.
This results in:
- A single source of truth for tools and their pins
- A single cache for tools that can be re-used between all CI and build jobs
- No more Github API calls for tooling. *_Rate limiting will be a thing of
the past._*
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Add a new `github` changed file group that includes everything in the
`.github` directory. Further refine the `pipeline` group to only
include scripts, workflows, and actions files in `.github`. We also move
the `CODEOWNERS` file into `.github/` to simplify `github` grouping.
As `build` logic responds to changes to the `pipeline` group this will
result in no longer building and testing everything for simple
changes in `github` that don't affect the pipeline.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
[VAULT-39160] actions(hcp): add support for testing custom images on HCP (#9345)
Add support for running the `cloud` scenario with a custom image in the
int HCP environment. We support two new tags that trigger new
functionality. If the `hcp/build-image` tag is present on a PR at the
time of `build`, we'll automatically trigger a custom build for the int
environment. If the `hcp/test` tag is present, we'll trigger a custom
build and run the `cloud` scenario with the resulting image.
* Fix a bug in our custom build pattern to handle prerelease versions.
* pipeline(hcp): add `--github-output` support to `show image` and
`wait image` commands.
* enos(hcp/create_vault_cluster): use a unique identifier for HVN
and vault clusters.
* actions(enos-cloud): add workflow to execute the `cloud` enos
scenario.
* actions(build): add support for triggering a custom build and running
the `enos-cloud` scenario.
* add more debug logging and query without a status
* add shim build-hcp-image for CE workflows
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Update our pins to the latest version. Essentially all of these are
related actions needing to run on Node 24. Both our self-hosted and the
Github hosted runners that we use are all on a new enough version of
actions/runner that it shouldn't be a problem.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* license: add support for publishing artifacts to IBM PAO (#8366)
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: brian shore <bshore@hashicorp.com>
Co-authored-by: Ethel Evans <ethel.evans@hashicorp.com>
Co-authored-by: Ryan Cragun <me@ryan.ec>
As part of this we also update the pin of gotestsum to 1.12.3 to allow
for building it with Go 1.25.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* VAULT-34830: enable the new workflow (#8661)
* pipeline: various fixes for the cutover to the enterprise first workflow (#8686)
Various small fixes that were discovered when doing the cutover to the enterprise first merge workflow:
- The `actions-docker-build` action infers enterprise metadata magically from the repository name. Use a branch that allows configuring the repo name until it's merged upstream.
- Fix some CE-In-Enterprise outputs in our metadata job.
- Pass the recurse depth flag correctly when creating backports
- Set the package name when calling the `build-vault` composite action
- Disallow merging changes into `main` and `release/*` when executing in the `hashicorp/vault` repository. This is a hack until PSS-909 is resolved.
- Use self-hosted runners when testing arm64 CE containers in enterprise.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Conflicts:
.github/workflows/backport-automation-ent.yml
.github/workflows/test-run-enos-scenario-containers.yml
---------
Signed-off-by: Ryan Cragun <me@ryan.ec>
Various small changes and tweaks to our CI/CD workflows to allow for running CE branches in the context of `hashicorp/vault-enterprise`.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Ubuntu 20.04 has reached EOL and is no longer a supported runner host distro. Historically we've relied on it for our CGO builds as it contains an old enough version of glibc that we can retain compatibility with all of our supported distros and build on a single host distro. Rather than requiring a new RHEL 8 builder (or some equivalent), we instead build CGO binaries inside an Ubuntu 20.04 container along with its glibc and various C compilers.
I've separated out system package changes, the Go toolchain install, and external build tools tools install into different container layers so that the builder container used for each branch is maximally cacheable.
On cache misses these changes result in noticeably longer build times for CGO binaries. That is unavoidable with this strategy. Most of the time our builds will get a cache hit on all layers unless they've changed any of the following:
- .build/*
- .go-version
- .github/actions/build-vault
- tools/tools.sh
- Dockerfile
I've tried my best to reduce the cache space used by each layer. Currently our build container takes about 220MB of cache space. About half of that ought to be shared cache between main and release branches. I would expect total new cache used to be in the 500-600MB range, or about 5% of our total space.
Some follow-up idea that we might want to consider:
- Build everything inside the build container and remove the github actions that set up external tools
- Instead of building external tools with `go install`, migrate them into build scripts that install pre-built `linux/amd64` binaries
- Migrate external to `go tool` and use it in the builder container. This requires us to be on 1.24 everywhere so ought not be considered until that is a reality.
Signed-off-by: Ryan Cragun <me@ryan.ec>
While working on VAULT-34829 it became apparent that if our new backporter
could know which branches are active and which CE counterparts are active
then we could completely omit the need for `ce` backport labels and instead
automatically backport to corresponding CE branches that are active.
To facilitate that we can re-use our `.release/versions.hcl` file as it is
the current source of truth for our present backport assistant workflow.
Here we add a new `pipeline releases list versions` command that is capable
of decoding that file and optionally displaying it. It will be used in the
next PR that fully implements VAULT-34829.
As part of this work we refactors `pipeline releases` to include a new `list`
sub-command and moved both `list-active-versions` and `versions` to it.
We also include a few small fixes that were noticed:
- `.release/verions.hcl` was not up-to-date
- Our cached dynamic config was not getting recreated when the pipeline
tool changed. That has been fixed so now dynamic config should always
get recreated when the pipeline binary changes
- We now initialize a git client when using the `github` sub-command.
This will be used in more forthcoming work
- Update our changed file detection to resolve some incorrect groupings
- Add some additional changed file helpers that we be used in forthcoming
work
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-36174: pipeline(go): Deal with relative path issues in cache restore
Cache restore have some surprising relative pathing behavior we need
to deal with. Since our Go module cache-path is in $HOME
(/home/runner/go/...) and $HOME can be different depending on our
self-hosted vs Github hosted runners (/home/runner/actions-runner/_work
vs. /home/runner/work) we need to factor in the absolute path of $HOME
when creating our cache key. If we don't then cache restores will be
incompatible on one or the other runner. This is because the tar restore
uses relative paths backwards our our depth doesn't match on both
runners.
We also slightly change our module caches here to only get
`.../mod/cache` instead of the whole `.../mod` directory. Go is able
to unzip the modules in cache faster than it takes a longer download
with the entire mod directory.
This will double our module caches *sigh* but they ought to be smaller
as they only contain the module zips.
See: https://github.com/actions/cache/issues/1127
Signed-off-by: Ryan Cragun <me@ryan.ec>
* use working dir, not home
Signed-off-by: Ryan Cragun <me@ryan.ec>
* fix logic on no-save
Signed-off-by: Ryan Cragun <me@ryan.ec>
* do a module list after download
Signed-off-by: Ryan Cragun <me@ryan.ec>
---------
Signed-off-by: Ryan Cragun <me@ryan.ec>
Something recently changed in Github Actions' decoding and display of
git errors is output. Now previously benign errors are showing up as
annotations and causing confusion.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* Refactor security-scan workflow
* Fix step names
"Set up" is correct grammatically and canonical with the "Set up job" step at the start of the GHA run log or UI.
* enos(artifactory): unify dev and test scenario artifactory metadata into new module
There was previously a lot of shared logic between
`build_artifactory_artifact` and `build_artifactory_package` as it
regards to building an artifact name. When it comes down to it, both
modules are very similar and their only major difference is searching
for any artifact (released or not) by either a combination of
`revision`, `edition`, `version`, and `type` vs. searching for a
released artifact with a combination of `version`, `edition`, and
`type`.
Rather than bolt on new `s390x` and `fips1403` artifact metadata to
both, I factored their metadata for package names and such into a
unified and shared `artifact/metadata` module that is now called by
both.
This was tricky as dev and test scenarios currently differ in what
we pass in as the `vault_version`, but we hope to remove that
difference soon. We also add metadata support for the forthcoming
FIPS 140-3.
This commit was tested extensively, along with other test scenarios
in support for `s390x but will be useful immediately for FIPS 140-3
so I've extracted it out.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* Fix artifactory metadata before merge
The initial pass of the artifactory metadata was largely untested and
extracted from a different branch. After testing, this commit fixes a
few issues with the metadata module.
In order to test this I also had to fix an issue where AWS secrets
engine testing became a requirement but is impossible unless you exectue
against a blessed AWS account that has required roles. Instead, we now
make those verification opt-in via a new variable.
We also make some improvements to the pki-verify-certificates script so
that it works reliably against all our supported distros.
We also update our dynamic configuration to use the updated versions in
samples.
Signed-off-by: Ryan Cragun <me@ryan.ec>