Pin to the latest actions in preparation for the migration to
`actions/upload-artifact@v4`, `actions/download-artifact@v4`, and
`hashicorp/actions-docker-build@v2` on May 6 or 7.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Context
-------
Building and testing Vault artifacts on pull requests and merges is
responsible for about 1/3rd of our overall spend on Vault CI. Of the
artifacts that we ship as part of a release, we do Enos testing scenarios
on the `linux/amd64` and `linux/arm64` binaries and their derivative
artifacts. The extended build artifacts for non-Linux platforms or less
common machine architectures are not tested at this time. They are built,
notarized, and signed as part of every pull request update and merge. As
we don't actually test these artifacts, the only gain we get from this
rather expensive behavior is that we wont merge a change that would prevent
Vault from building on one of the extended targets. Extended platform or
architecture changes are quite rare, so performing this work as frequently
as we do is costly in both monetary and developer time for little relative
safety benefit.
Goals
-----
Rethink and implement how and when we build binaries and artifacts of Vault
so that we can spend less money on repetitive work and while also reducing
the time it takes for the build and test pipelines to complete.
Solution
--------
Instead of building all release artifacts on every push, we'll opt to build
only our testable (core) artifacts. With this change we are introducing a
bit of risk. We could merge a change that breaks an extended platform and
only find out after the fact when we trigger a complete build for a release.
We'll hedge against that risk by building all of the release targets on a
scheduled cadence to ensure that they are still buildable.
We'll make building all of the targets optional on any pull request by
use of a `build/all` label on the pull request.
Further considerations
----------------------
* We want to reduce the total number of workflows and runners for all of our
pipelines if possible. As each workflow runner has infrastructure cost and
runner time penalties, using a single runner over many is often preferred.
* Many of our jobs runners have been optimized for cost and performance. We
should simplify the choices of which runners to use.
* CRT requires us to use the same build workflow in both CE and Ent.
Historically that meant that modifying `build.yml` in CE would result in a
merge conflict with `build.yml` in Ent, and break our merge workflows.
* Workflow flow control in both `build.yml` and `ci.yml` can be quite
complicated, as each needs to maintain compatibility whether executed as CE
or Ent, and when triggered with various Github events like pull_request,
push, and workflow_call, each with their own requirements.
* Many jobs utilize similar patterns of flow control and metadata but are not
reusable.
* Workflow call depth has a maximum of four, so we need to be quite
considerate when calling other workflows.
* Called workflows can only have 10 inputs.
Implementation
--------------
* Refactor the `build.yml` workflow to be agnostic to whether or not it is
executing in CE or Ent. That makes future updates to the build much easier
as we won't have to worry about merge conflicts when the change is merged
downstream.
* Extract common steps in workflows into composite actions that we can reuse.
* Fix bugs where some but not all workflows would use different Git
references when building and testing a pull request.
* We rewrite the application, docs, and UI change helpers as a composite
action. This allows us to re-use this logic to make consistent behavior
choices across build and CI.
* We combine several `build.yml` and `ci.yml` jobs into our final job.
This reduces the number of workflows required for the same behavior while
saving time overall.
* Update most of our action pins.
Results
-------
| Metric | Before | After | Diff |
|-------------------|----------|---------|-------|
| Duration: | ~14-18m | ~15-18m | ~ = |
| Workflows: | 43 | 18 | - 58% |
| Billable time: | ~1h15m | 16m | - 79% |
| Saved artifacts: | 34 | 12 | - 65% |
Infra costs should map closely to billable time.
Network I/O costs should map closely to the workflow count.
Storage costs should map directly with saved artifacts.
We could probably get parity with duration by getting more clever with
our UBI container build, as that's where we're seeing the increase. I'm
not yet concerned as it takes roughly the same time for this job to
complete as it did before.
While the CI workflow was not the focus on the PR, some shared
refactoring does show some marginal improvements there.
| Metric | Before | After | Diff |
|-------------------|----------|----------|--------|
| Duration: | ~24m | ~12.75m | - 15% |
| Workflows: | 55 | 47 | - 8% |
| Billable time: | ~4h20m | ~3h36m | - 7% |
Further focus on streamlining the CI workflows would likely result in a
few more marginal improvements, but nothing on the order like we've seen
with the build workflow.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Improve our build workflow execution time by using custom runners,
improved caching and conditional Web UI builds.
Runners
-------
We improve our build times[0] by using larger custom runners[1] when
building the UI and Vault.
Caching
-------
We improve Vault caching by keeping a cache for each build job. This
strategy has the following properties which should result in faster
build times when `go.sum` hasn't been changed from prior builds, or
when a pull request is retried or updated after a prior successful
build:
* Builds will restore cached Go modules and Go build cache according to
the Go version, platform, architecture, go tags, and hash of `go.sum`
that relates to each individual build workflow. This reduces the
amount of time it will take to download the cache on hits and upload
the cache on misses.
* Parallel build workflows won't clobber each others build cache. This
results in much faster compile times after cache hits because the Go
compiler can reuse the platform, architecture, and tag specific build
cache that it created on prior runs.
* Older modules and build cache will not be uploaded when creating a new
cache. This should result in lean cache sizes on an ongoing basis.
* On cache misses we will have to upload our compressed module and build
cache. This will slightly extend the build time for pull requests that
modify `go.sum`.
Web UI
------
We no longer build the web UI in every build workflow. Instead we separate
the UI building into its own workflow and cache the resulting assets.
The same UI assets are restored from cache during build worklows. This
strategy has the following properties:
* If the `ui` directory has not changed from prior builds we'll restore
`http/web_ui` from cache and skip building the UI for no reason.
* We continue to use the built-in `yarn` caching functionality in
`action/setup-node`. The default mode saves the `yarn` global cache.
to improve UI build times if the cache has not been modified.
Changes
-------
* Add per platform/archicture Go module and build caching
* Move UI building into a separate job and cache the result
* Restore UI cache during build
* Pin workflows
Notes
-----
[0] https://hashicorp.atlassian.net/browse/QT-578
[1] https://github.com/hashicorp/vault/actions/runs/5415830307/jobs/9844829929
Signed-off-by: Ryan Cragun <me@ryan.ec>
* address lint reports
* add diff-oss-ci and test-ui jobs to ci GHA workflow
* Add actions linter workflow
* Fix actions linter errors
* pin 3rd party components with SHA hash and limit actionlint workflow to pull requests touching paths under .github directory
* Fix actionlint runner
* pin SHA hash of 3rd party components
use .go-version file to provide go version to setup-go action
remove unncessary ref parameter in checkout action
---------
Co-authored-by: Brian Shore <bshore@hashicorp.com>
If we don't guard against pull_request being null, we do a lot of extra
checkout and path filtering, and it ends up putting everything in the UI
board.
I tested this in another repo, and it seems to behave correctly.
Add Open Source project workflow
This will help us triage open source issues into our various internal
project boards.
I tested this on a separate repo, and it seems to work.