As the Vault pipeline and release processes evolve over time, so too must the tooling that drives them. Historically we've utilized a combination of CI features and shell scripts that are wrapped into make targets to drive our CI. While this
approach has worked, it requires careful consideration of what features to use (bash in CI almost never matches bash in developer machines, etc.) and often requires a deep understanding of several CLI tools (jq, etc). `make` itself also has limitations in user experience, e.g. passing flags.
As we're all in on Github Actions as our pipeline coordinator, continuing to utilize and build CLI tools to perform our pipeline tasks makes sense. This PR adds a new CLI tool called `pipeline` which we can use to build new isolated tasks that we can string together in Github Actions. We intend to use this utility as the interface for future release automation work, see VAULT-27514.
For the first task in this new `pipeline` tool, I've chosen to build two small sub-commands:
* `pipeline releases list-versions` - Allows us to list Vault versions between a range. The range is configurable either by setting `--upper` and/or `--lower` bounds, or by using the `--nminus` to set the N-X to go back from the current branches version. As CE and ENT do not have version parity we also consider the `--edition`, as well as none-to-many `--skip` flags to exclude specific versions.
* `pipeline generate enos-dynamic-config` - Which creates dynamic enos configuration based on the branch and the current list of release versions. It takes largely the same flags as the `release list-versions` command, however it also expects a `--dir` for the enos directory and a `--file` where the dynamic configuration will be written. This allows us to dynamically update and feed the latest versions into our sampling algorithm to get coverage over all supported prior versions.
We then integrate these new tools into the pipeline itself and cache the dynamic config on a weekly basis. We also cache the pipeline tool itself as it will likely become a repository for pipeline specific tooling. The caching strategy for the `pipeline` tool itself will make most workflows that require it super fast.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-31402: Add verification for all container images
Add verification for all container images that are generated as part of
the build. Before this change we only ever tested a limited subset of
"default" containers based on Alpine Linux that we publish via the
Docker hub and AWS ECR.
Now we support testing all Alpine and UBI based container images. We
also verify the repository and tag information embedded in each by
deploying them and verifying the repo and tag metadata match our
expectations.
This does change the k8s scenario interface quite a bit. We now take in
an archive image and set image/repo/tag information based on the
scenario variants.
To enable this I also needed to add `tar` to the UBI base image. It was
already available in the Alpine image and is used to copy utilities to
the image when deploying and configuring the cluster via Enos.
Since some images contain multiple tags we also add samples for each
image and randomly select which variant to test on a given PR.
Signed-off-by: Ryan Cragun <me@ryan.ec>
- If we encounter a deadlock/long running test it is better to have go
test timeout. As we've noticed if we hit the GitHub step timeout, we
lose all information about what was running at the time of the timeout
making things harder to diagnose.
- Having the timeout through go test itself on a long running test it
outputs what test was running along with a full panic output within
the logs which is quite useful to diagnose
In order for our enterprise nightlies to run the same test-go job but
across a matrix of different base references we need to consider the
checkout ref in our failure and summary uploads in order to prevent
an upload race.
We also configure Git with our token before setting up Go so that
enterprise CI workflows can execute without downloading a module cache.
Signed-off-by: Ryan Cragun <me@ryan.ec>
It appears that with the latest runner image[0] that we occasionally see
a flaky test with an error related to our fontconfig cache:
```
Error: Browser timeout exceeded: 10s
Error while executing test: Acceptance | wrapped_token query param functionality: it authenticates when used with the with=token query param
Stderr:
Fontconfig error: No writable cache directories
[0822/180212.113587:WARNING:sandbox_linux.cc(430)] InitializeSandbox() called with multiple threads in process gpu-process.
```
This change rebuilds the fontconfig cache on Github hosted runners.
Hopefully we can remove this at some point when a new runner image is
released.
[0] https://github.com/actions/runner-images/releases/tag/ubuntu22%2F20240818.1
Signed-off-by: Ryan Cragun <me@ryan.ec>
Optimize the cost of the Security `scan` workflow by utilizing a
different runner. Previously this workflow would use the
`custom-linux-xl` in `vault` vs. the `c6a.4xlarge` on-demand runner in
`vault-enterprise. This resulted in the `vault` workflow costing an
order of magnitude more each month.
I tested with the following instances sizes to compare cost to execution
time:
| Runnner | Estimated Time | Cost Factor | Cost Score |
|---------|-----------------|-------------|-------------|
|ubuntu-latest|19m|1|19|
|custom-linux-small|21.5m|2|43|
|custom-linux-medium|11.5m|4|46|
|custom-linux-xl|8.5m|16|136|
Currently the `CI` and `build` require workflows take anywhere from
16-20 minutes on `vault`. Our goal is to not exceed that.
At this time we're going to try out `ubuntu-latest` as it gives us ~85%
savings and by far the best bang for our buck. If it ends up being a
burden we can switch to `custom-linux-medium` for ~66% cost savings but
still a reasonable runtime.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* VAULT-29583: Modernize default distributions in enos scenarios
Our scenarios have been running the last gen of distributions in CI.
This updates our default distributions as follows:
- Amazon: 2023
- Leap: 15.6
- RHEL: 8.10, 9.4
- SLES: 15.6
- Ubuntu: 20.04, 24.04
With these changes we also unlock a few new variants combinations:
- `distro:amzn seal:pkcs11`
- `arch:arm64 distro:leap`
We also normalize our distro key for Amazon Linux to `amzn`, which
matches the uname output on both versions that we've supported.
Signed-off-by: Ryan Cragun <me@ryan.ec>
In order to take advantage of enos' ability to outline scenarios and to
inventory what verification they perform we needed to retrofit all of
that information to our existing scenarios and steps.
This change introduces an initial set of descriptions and verification
declarations that we can continue to refine over time.
As doing this required that I re-read every scenanario in its entirety I
also updated and fixed a few things along the way that I noticed,
including adding a few small features to enos that we utilize to make
handling initial versions programtic between versions instead of having a
delta between our globals in each branch.
* Update autopilot and in-place upgrade initial versions
* Programatically determine which initial versions to use based on Vault
version
* Partially normalize steps between scenarios to make comparisons easier
* Update the MOTD to explain that VAULT_ADDR and VAULT_TOKEN have been
set
* Add scenario and step descriptions to scenarios
* Add initial scenario quality verification declarations to scenarios
* Unpin Terraform in scenarios as >= 1.8.4 should work fine
* Automate feature changelog checking
* Add changelog for testing
* Simplify check
* Forgot the end of line thing
* Escape the characters
* More testing
* Last test?
* Delete test changelog
Fix a node deprecation warning by updating our actions-slack-status to
v2.0.1, which pulls in a newer version of the github-script action that
causes the deprecation warning.
Signed-off-by: Ryan Cragun <me@ryan.ec>
* onboard to use backport-assistant with lts support
* add active releases manifest file
* fix CE active release versions
* update manifest and backport files for 0.4.1 bpa version
* remove BACKPORT_LABEL_TEMPLATE
* remove extra container;
* seperate backport.yml files
---------
Co-authored-by: Jeanne Franco <jeanne.franco@hashicorp.com>
Update hashicorp/actions-packaging-linux to our rewritten version
that no longer requires building a Docker container or relies on code
hosted in a non-hashicorp repo for packaging.
As internal actions are not managed in the same manner as external
actions in via the tsccr trusted components db, the tsccr helper is
unable to easily re-pin hashicorp/* actions. As such, we unpin some
pinned hashicorp/* actions to automatically pull in updates that are
compatible.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Update the Github Actions pins to use the next generation of actions
that are supported by CRT.
In some cases these are simply to resolve Node 16 deprecations. In
others, we can now use `action/upload-artifact@v4` and
`actions/download-artifact@v4` since the next generation of actions like
`hashicorp/actions-docker-build@v2` and
`hashicorp/actions-persist-metadata@v2` use the `v4` versions of these.
Signed-off-by: Ryan Cragun <me@ryan.ec>