Update the base images for all scenarios:
- RHEL: upgrade base image for 10 to 10.1
- RHEL: upgrade base image for 9 to 9.7
- SLES: upgrade base image for 15 to 15.7
- SLES: add SLES 16.0 to the matrix
- OpenSUSE: remove OpenSUSE Leap from the matrix
I ended up removing OpenSUSE because the images that we were on were rarely updated and that resulted in very slow scenarios because of package upgrades. Also, despite the latest release being in October I didn't find any public cloud images produced for the new version of Leap. We can consider adding it back later but I'm comfortable just leaving SLES 15 and 16 in there for that test coverage.
I also ended up fixing a bug in our integration host setup where we'd provision three nodes instead of one. That ought to result in many fewer instance provisions per scenario. I also had to make a few small tweaks in how we detected whether or not SELinux is enabled, as the prior implementation did not work for SLES 16.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* use 'stable' instead of .go-version for the security scanner
if we don't do this, the security scanner might not run because it's
using a different version of Go than what we have on whatever release
branch this is running on.
* update branches the scanner runs on
Co-authored-by: Josh Black <raskchanky@gmail.com>
Fix an incompatibility where we check out the repository with
checkout@v6 and then attempt to check it out again at checkout@v5 in the
set-product-version action.
* update enos directory to trigger lint
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
One feature of ondemand self-hosted runners is that we don't contend
with other repositories for self-hosted runners. The penalty for using
ondemand is that there are no hot runner pools, so provisioning time
is usually around 30 second but in worst can hit the two minutes mark.
These numbers rely on immediately capacity in the default region
(us-west-2). Every once in a while we see runner provisioning times for
ondemand CI runners go into the tens of minutes, presumably due to
capacity issues. Instead of waiting around for a runner that will
fulfill our single instance type, we'll add a few fallback types we can
attempt if we hit a capacity snag on our preferred machine.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
This was started to remove a trailing " that would show up when UI tests
failed. Since I was here I normalized our emoji to use `flashing-light`
instead of `rotating_light` because the former is rendered better in the
new Slack instance.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* actions(setup-enos): update action-setup-enos to pull in enos 0.0.34 (#10561)
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* install sqlc before building vcm
* make a meaningless change to trigger CI
* turn off the go.work file
* remove test comment
Co-authored-by: Josh Black <raskchanky@gmail.com>
When a pull request is created against a CE branch and it has changed any files in the `gotoolchain` group we'll automatically trigger the diff for every Go module file in the repo against the equivalent in the corresponding enterprise branch. If there's a delta in like configuration it will automatically fail the `build/ce-checks` job. It will also write a complete explanation of the diff to the step output and also to the `build/ce-checks` job step summary.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* license: update headers to IBM Corp.
* `make proto`
* update offset because source file changed
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Migrate all slack notifications to the `ibm-hashicorp` workspace. This
required creating three new `incoming-webhook` configurations which are
capable of posting into three different Slack channels, depending on the
workflow.
As they all use the `incoming-webhook` event, many of our integrations
had to be migrated from `chat.postMessage` and those changes are
reflected here.
Of note, there are lots of changes to the `release-procedure-ent`
workflow as it has by far the most uses of the Slack integrations. In
some cases it was to appease `actionlint` issues, in others I made small
idiomatic tweaks. I translated all of the payload messages to YAML
instead of JSON, which fits better into our existing workflows and also
because most of the payload messages were invalid JSON all together.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Our service users now have compatible use-case's that allow us to use
the service user credentials everywhere. Drop `action-doormat` so that
our workflows execute correctly in the `hashicorp/vault` context.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* SECVULN-22299: Use Doormat GitHub Action in CI
* remove step id
* remove step id
* grab aws account id in separate step
* add oidc perms
* add perms for other workflows
* remove usages of aws login creds
* add conditions for CE vs ent
* fix lint
* test perms
* add perms
* fix metadata
* update role arn
* use ci role arn
* print secret
* try again
* try workaround
* update all arns
* remove echo step
* cleanup
* cleanup
* address feedback
* re-add perms
* use service account
* fix conflict
* address feedback
* add read permission
* use write-all
* expose role arn
Co-authored-by: Charles Nwokotubo <charles.nwokotubo@hashicorp.com>
* actions(install-tools): include os and arch in cache key
When caching and/or restoring our tools we should include the os and
arch in the key to ensure that we don't accidentally download the wrong
tools on different runners.
We also update the nightlies to specifically cache arm64 before running
the tests.
* actionlint: add arm self-hosted runner keys
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* actions: use self-hosted runners in hashicorp/vault
While it is recommended that we use self-hosted runners for every
workflow in private and internal accounts, this change was primarily
motivated by different runner types using different cache paths. By
using the same runner type everywhere we can avoid double caches of the
internal Vault tools.
* disable the terraform wrapper in ci-bootstrap to handle updated action
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* [VAULT-39671] tools: use github cache for external tools
We currently have some ~13 tools that we need available both locally for
development and in CI for building, linting, and formatting, and testing Vault.
Each branch that we maintain often uses the same set of tools but often pinned
to different versions.
For development, we have a `make tools` target that will execute the
`tools/tool.sh` installation script for the various tools at the correct pin.
This works well enough but is cumbersome if you’re working across many branches
that have divergent versions.
For CI the problem is speed and repetition. For each build job (~10) and Go test
job (16-52) we have to install most of the same tools for each job. As we have
extremely limited Github Actions cache we can’t afford to cache the entire vault
go build cache, so if we were to build them from source each time we incur a
penalty of downloading all of the modules and building each tool from source.
This yields about an extra 2 minutes per job to install all of the tools. We’ve
worked around this problem by writing composite actions that download pre-built
binaries of the same tools instead of building them from source. That usually
takes a few seconds. The downside of that approach is rate limiting, which
Github has become much more aggressive in enforcing.
That leads us to where we are before this work:
- For builds in the compatibility docker container: the tools are built from
source and cached as separate builder image layer. (usually fast as we get
cache hits, slow on cache misses)
- For builds that compile directly on the runner: the tools are installed on
each job runner by composite github actions (fast, uses API requests, prone
to throttling)
- For tests, they use the same composite actions to install the tools on each
job. (fast, uses API requests, prone to throttling)
This also leads to inconsistencies since there are two sources of truth: the
composite actions have their own version pin outside of those in `tools.sh`.
This has led to drift.
We previously tried to save some API requests and move all builds into
the container. That almost works but docker's build conatiner had a hard
time with some esoteric builds. We could special case it but it's a bandaid at
best.
A prior version of this work (VAULT-39654) investigated using `go tool`, but
there were some showstopper issues with that workflow that make it a non-starter
for us. Instead, we’ll attempt to use more actions cache to resolve the
throttling. This will allow us to have a single source of truth for tools, their
pins, and afford us the same speed on cache hits as we had previously without
downloading the tools from github releases thousands of times per day.
We add a new composite github action for installing our tools.
- On cache misses it builds the tools and installs them into a cacheable path.
- On cache hits it restore the cacheable path.
- It adds the tools to the GITHUB_PATH to ensure runner based jobs can find
them.
- For Docker builds it mounts the tools at `/opt/tools/bin` which is
part of the PATH in the container.
- It uses a cache key of the SHA of the tools directory along with the
working directory SHA which is required to deal with actions/cache
issues.
This results in:
- A single source of truth for tools and their pins
- A single cache for tools that can be re-used between all CI and build jobs
- No more Github API calls for tooling. *_Rate limiting will be a thing of
the past._*
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Add a new `github` changed file group that includes everything in the
`.github` directory. Further refine the `pipeline` group to only
include scripts, workflows, and actions files in `.github`. We also move
the `CODEOWNERS` file into `.github/` to simplify `github` grouping.
As `build` logic responds to changes to the `pipeline` group this will
result in no longer building and testing everything for simple
changes in `github` that don't affect the pipeline.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
[VAULT-39160] actions(hcp): add support for testing custom images on HCP (#9345)
Add support for running the `cloud` scenario with a custom image in the
int HCP environment. We support two new tags that trigger new
functionality. If the `hcp/build-image` tag is present on a PR at the
time of `build`, we'll automatically trigger a custom build for the int
environment. If the `hcp/test` tag is present, we'll trigger a custom
build and run the `cloud` scenario with the resulting image.
* Fix a bug in our custom build pattern to handle prerelease versions.
* pipeline(hcp): add `--github-output` support to `show image` and
`wait image` commands.
* enos(hcp/create_vault_cluster): use a unique identifier for HVN
and vault clusters.
* actions(enos-cloud): add workflow to execute the `cloud` enos
scenario.
* actions(build): add support for triggering a custom build and running
the `enos-cloud` scenario.
* add more debug logging and query without a status
* add shim build-hcp-image for CE workflows
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
Update our pins to the latest version. Essentially all of these are
related actions needing to run on Node 24. Both our self-hosted and the
Github hosted runners that we use are all on a new enough version of
actions/runner that it shouldn't be a problem.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
* license: add support for publishing artifacts to IBM PAO (#8366)
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: brian shore <bshore@hashicorp.com>
Co-authored-by: Ethel Evans <ethel.evans@hashicorp.com>
Co-authored-by: Ryan Cragun <me@ryan.ec>
As part of this we also update the pin of gotestsum to 1.12.3 to allow
for building it with Go 1.25.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>