39 Commits

Author SHA1 Message Date
Kuba Wieczorek
619843589b
CI: Pin VCM version used in Run Autopilot upgrade tests workflow (#28820) 2024-10-31 16:51:04 +00:00
Ryan Cragun
31b139c8ce
pipeline: include the version in the dynamic config key (#28793)
Cache scopes allow other branches to inherit default branch scopes,
which means that release branches can restore a key from main. Instead,
we now include the vault version as part of the cache key to ensure
we don't include versions that are incompatible with our version.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-10-29 16:02:00 +00:00
Ryan Cragun
ce5885279b
VAULT-31181: Add pipeline tool to Vault (#28536)
As the Vault pipeline and release processes evolve over time, so too must the tooling that drives them. Historically we've utilized a combination of CI features and shell scripts that are wrapped into make targets to drive our CI. While this 
approach has worked, it requires careful consideration of what features to use (bash in CI almost never matches bash in developer machines, etc.) and often requires a deep understanding of several CLI tools (jq, etc). `make` itself also has limitations in user experience, e.g. passing flags.

As we're all in on Github Actions as our pipeline coordinator, continuing to utilize and build CLI tools to perform our pipeline tasks makes sense. This PR adds a new CLI tool called `pipeline` which we can use to build new isolated tasks that we can string together in Github Actions. We intend to use this utility as the interface for future release automation work, see VAULT-27514.

For the first task in this new `pipeline` tool, I've chosen to build two small sub-commands:

* `pipeline releases list-versions` - Allows us to list Vault versions between a range. The range is configurable either by setting `--upper` and/or `--lower` bounds, or by using the `--nminus` to set the N-X to go back from the current branches version. As CE and ENT do not have version parity we also consider the `--edition`, as well as none-to-many `--skip` flags to exclude specific versions.

* `pipeline generate enos-dynamic-config` - Which creates dynamic enos configuration based on the branch and the current list of release versions. It takes largely the same flags as the `release list-versions` command, however it also expects a `--dir` for the enos directory and a `--file` where the dynamic configuration will be written. This allows us to dynamically update and feed the latest versions into our sampling algorithm to get coverage over all supported prior versions.

We then integrate these new tools into the pipeline itself and cache the dynamic config on a weekly basis. We also cache the pipeline tool itself as it will likely become a repository for pipeline specific tooling. The caching strategy for the `pipeline` tool itself will make most workflows that require it super fast.


Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-10-23 15:31:24 -06:00
Kuba Wieczorek
80729f063f
[VAULT-28762] Run Autopilot upgrade tests on main and PRs to main on ENT if the AP code has changed (#28697)
Co-authored-by: Josh Black <raskchanky@gmail.com>
2024-10-14 16:59:00 +01:00
Steven Clark
8fec0056c1
Update buf to 1.45.0 (#28632) 2024-10-08 15:02:15 -06:00
Ryan Cragun
b6145bc3bb
protobuf: rebuild protos with protobuf 1.35.1 (main) (#28617)
* protobuf: rebuild protos with protobuf 1.35.1
* protobuf: unpin protoc-gen-go-grpc on main

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-10-07 14:54:51 -06:00
Ryan Cragun
c8e6169d5d
VAULT-31402: Add verification for all container images (#28605)
* VAULT-31402: Add verification for all container images

Add verification for all container images that are generated as part of
the build. Before this change we only ever tested a limited subset of
"default" containers based on Alpine Linux that we publish via the
Docker hub and AWS ECR.

Now we support testing all Alpine and UBI based container images. We
also verify the repository and tag information embedded in each by
deploying them and verifying the repo and tag metadata match our
expectations.

This does change the k8s scenario interface quite a bit. We now take in
an archive image and set image/repo/tag information based on the
scenario variants.

To enable this I also needed to add `tar` to the UBI base image. It was
already available in the Alpine image and is used to copy utilities to
the image when deploying and configuring the cluster via Enos.

Since some images contain multiple tags we also add samples for each
image and randomly select which variant to test on a given PR.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-10-07 10:16:22 -06:00
Luis (LT) Carbonell
146ad63256
Add build for FIPS ARM Docker images (#28310)
* Add build for FIPS ARM Docker images

* arm64 build
2024-09-11 15:07:34 -04:00
Ryan Cragun
b5d32b7bec
enos: add shfmt formatting to enos module scripts (#28142)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-08-23 13:45:30 -06:00
akshya96
ae6854e9f2
updating remaining occurances of setup-go (#28110) 2024-08-16 13:00:48 -07:00
akshya96
62e0e62742
updating remaining occurances of upload-artifact (#28108) 2024-08-16 11:46:08 -07:00
Steven Clark
297a9831f1
Pin protoc-gen-go-grpc to 1.4.0 (#27892)
* Pin protoc-gen-go-grpc to 1.4.0

They introduced a replace statement within the go.mod file which
causes failures running go install protoc-gen-go-grpc@latest

Workaround for now is to pin to the previous version

See https://github.com/grpc/grpc-go/issues/7448

* Add missing v to version v1.4.0 instead of 1.4.0 within tools/tools.sh
2024-07-29 14:36:43 +00:00
Kuba Wieczorek
5d172d5861
[VAULT-28666] Use the retry script to check release version for gotestsum in CI (#27878) 2024-07-26 16:41:01 +00:00
Kuba Wieczorek
7a4cf3d273
[VAULT-28666] Use the retry script to check release versions for external tools installed in CI (#27873) 2024-07-26 10:17:32 -04:00
Kuba Wieczorek
920c08966c
[VAULT-28666] Enable the --clobber flag on GitHub CLI release downloads in CI to avoid errors when retrying (#27852) 2024-07-24 12:24:30 +01:00
Kuba Wieczorek
b7d9008e5b
[VAULT-28666] Retry tool download from GitHub releases on failure in GitHub Actions (GHA) (#27786) 2024-07-16 09:07:30 +01:00
Kuba Wieczorek
d9cd3a094a
[VAULT-28666] Retry staticcheck download on failure in GitHub Actions (GHA) (#27781) 2024-07-15 13:19:16 -04:00
Josh Black
56b32081f0
add a retry-command script (#27754)
* add a retry-command script

* add license header to retry script
2024-07-12 13:18:41 -07:00
Violet Hynes
64ce6e74da
Update actions/checkout to 4.1.7 (#27636) 2024-07-02 09:25:21 -04:00
Ryan Cragun
ad5ca3e7b7
actions: use the Github API for pull request labels (#27603)
We have seen instances where the github.event.pull_request.labels.*.name
context in Github Actions doesn't actually include the labels.

Instead, we now pull and parse them ourselves instead of relying on that
context.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-06-25 16:32:12 -06:00
Ryan Cragun
f7c16796ed
lint: fix misspell linter install (#27408)
It appears that starting with v0.5.2 the misspell linter embeds the
version directory into the release archive.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-06-07 19:14:20 -04:00
Bianca Moreira
db388a5ecd
legal: include license in release zip and docker image (#26801)
* legal: include license in release zip and docker image

* Move license logic to script

* Add cp license to build vault action

* test

* Trigger Build
2024-05-17 17:18:38 +02:00
Ryan Cragun
fc4042bd2e
[QT-687] use new packaging action (#26905)
Update hashicorp/actions-packaging-linux to our rewritten version
that no longer requires building a Docker container or relies on code
hosted in a non-hashicorp repo for packaging.

As internal actions are not managed in the same manner as external
actions in via the tsccr trusted components db, the tsccr helper is
unable to easily re-pin hashicorp/* actions. As such, we unpin some
pinned hashicorp/* actions to automatically pull in updates that are
compatible.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-05-10 16:51:06 +00:00
Ryan Cragun
842dff8342
[QT-711] actions: use next generation CRT actions (#26882)
Update the Github Actions pins to use the next generation of actions
that are supported by CRT.

In some cases these are simply to resolve Node 16 deprecations. In
others, we can now use `action/upload-artifact@v4` and
`actions/download-artifact@v4` since the next generation of actions like
`hashicorp/actions-docker-build@v2` and
`hashicorp/actions-persist-metadata@v2` use the `v4` versions of these.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-05-08 15:17:20 -06:00
Ryan Cragun
1f2f3ff20a
[QT-711] Pin to latest github actions (#26789)
Pin to the latest actions in preparation for the migration to
`actions/upload-artifact@v4`, `actions/download-artifact@v4`, and
`hashicorp/actions-docker-build@v2` on May 6 or 7.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-05-02 13:29:20 -06:00
Steven Clark
3140dbe209
Adapt CI to use new filenames for misspell releases (#26506) 2024-04-18 17:11:07 +00:00
Christopher Swenson
961bf20bdb
Use enumer to generate String() methods for most enums (#25705)
We have many hand-written String() methods (and similar) for enums.
These require more maintenance and are more error-prone than using
automatically generated methods. In addition, the auto-generated
versions can be more efficient.

Here, we switch to using https://github.com/loggerhead/enumer, itself
a fork of https://github.com/diegostamigni/enumer, no longer maintained,
and a fork of the mostly standard tool
https://pkg.go.dev/golang.org/x/tools/cmd/stringer.
We use this fork of enumer for Go 1.20+ compatibility and because
we require the `-transform` flag to be able to generate
constants that match our current code base.

Some enums were not targeted for this change:
2024-04-17 11:14:14 -07:00
Ryan Cragun
a79d8a3f69
ci: install gosimports (#25400)
https://github.com/hashicorp/vault/pull/25383 add gosimports to the list
of external tools that are required. The precheck for some linting
workflows fail because we didn't add a corresponding workflow to install
them.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-02-14 09:56:28 -07:00
Ryan Cragun
d255cb86b2
build: use the version with metadata in our metadata file (#25372)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-02-12 21:24:11 +00:00
Ryan Cragun
89c75d3d7c
[QT-637] Streamline our build pipeline (#24892)
Context
-------
Building and testing Vault artifacts on pull requests and merges is
responsible for about 1/3rd of our overall spend on Vault CI. Of the
artifacts that we ship as part of a release, we do Enos testing scenarios
on the `linux/amd64` and `linux/arm64` binaries and their derivative
artifacts. The extended build artifacts for non-Linux platforms or less
common machine architectures are not tested at this time. They are built,
notarized, and signed as part of every pull request update and merge. As
we don't actually test these artifacts, the only gain we get from this
rather expensive behavior is that we wont merge a change that would prevent
Vault from building on one of the extended targets. Extended platform or
architecture changes are quite rare, so performing this work as frequently
as we do is costly in both monetary and developer time for little relative
safety benefit.

Goals
-----
Rethink and implement how and when we build binaries and artifacts of Vault
so that we can spend less money on repetitive work and while also reducing
the time it takes for the build and test pipelines to complete.

Solution
--------
Instead of building all release artifacts on every push, we'll opt to build
only our testable (core) artifacts. With this change we are introducing a
bit of risk. We could merge a change that breaks an extended platform and
only find out after the fact when we trigger a complete build for a release.
We'll hedge against that risk by building all of the release targets on a
scheduled cadence to ensure that they are still buildable.

We'll make building all of the targets optional on any pull request by
use of a `build/all` label on the pull request.

Further considerations
----------------------
* We want to reduce the total number of workflows and runners for all of our
  pipelines if possible. As each workflow runner has infrastructure cost and
  runner time penalties, using a single runner over many is often preferred.
* Many of our jobs runners have been optimized for cost and performance. We
  should simplify the choices of which runners to use.
* CRT requires us to use the same build workflow in both CE and Ent.
  Historically that meant that modifying `build.yml` in CE would result in a
  merge conflict with `build.yml` in Ent, and break our merge workflows.
* Workflow flow control in both `build.yml` and `ci.yml` can be quite
  complicated, as each needs to maintain compatibility whether executed as CE
  or Ent, and when triggered with various Github events like pull_request,
  push, and workflow_call, each with their own requirements.
* Many jobs utilize similar patterns of flow control and metadata but are not
  reusable.
* Workflow call depth has a maximum of four, so we need to be quite
  considerate when calling other workflows.
* Called workflows can only have 10 inputs.

Implementation
--------------
* Refactor the `build.yml` workflow to be agnostic to whether or not it is
  executing in CE or Ent. That makes future updates to the build much easier
  as we won't have to worry about merge conflicts when the change is merged
  downstream.
* Extract common steps in workflows into composite actions that we can reuse.
* Fix bugs where some but not all workflows would use different Git
  references when building and testing a pull request.
* We rewrite the application, docs, and UI change helpers as a composite
  action. This allows us to re-use this logic to make consistent behavior
  choices across build and CI.
* We combine several `build.yml` and `ci.yml` jobs into our final job.
  This reduces the number of workflows required for the same behavior while
  saving time overall.
* Update most of our action pins.

Results
-------

| Metric            | Before   | After   | Diff  |
|-------------------|----------|---------|-------|
| Duration:         | ~14-18m  | ~15-18m | ~ =   |
| Workflows:        | 43       | 18      | - 58% |
| Billable time:    | ~1h15m   | 16m     | - 79% |
| Saved artifacts:  | 34       | 12      | - 65% |

Infra costs should map closely to billable time.
Network I/O costs should map closely to the workflow count.
Storage costs should map directly with saved artifacts.

We could probably get parity with duration by getting more clever with
our UBI container build, as that's where we're seeing the increase. I'm
not yet concerned as it takes roughly the same time for this job to
complete as it did before.

While the CI workflow was not the focus on the PR, some shared
refactoring does show some marginal improvements there.

| Metric            | Before   | After    | Diff   |
|-------------------|----------|----------|--------|
| Duration:         | ~24m     | ~12.75m  | - 15%  |
| Workflows:        | 55       | 47       | - 8%   |
| Billable time:    | ~4h20m   | ~3h36m   | - 7%   |

Further focus on streamlining the CI workflows would likely result in a
few more marginal improvements, but nothing on the order like we've seen
with the build workflow.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-02-06 21:11:33 +00:00
Violet Hynes
a174ed395b
VAULT-23732 Update github actions to non-deprecated versions (#25203) 2024-02-06 10:49:40 -05:00
Nick Cabatoff
349a859d32
Ensure that on the Ent side private modules are downloaded without proxy (#25013) 2024-01-23 20:47:45 +00:00
Nick Cabatoff
84bc9d9979
Add support for checking out a different Go version than the standard… (#25010) 2024-01-23 20:03:48 +00:00
Ryan Cragun
9a10689ca3
[QT-645] Restructure dev tools (#24559)
We're on a quest to reduce our pipeline execution time to both enhance
our developer productivity but also to reduce the overall cost of the CI
pipeline. The strategy we use here reduces workflow execution time and
network I/O cost by reducing our module cache size and using binary
external tools when possible. We no longer download modules and build
many of the external tools thousands of times a day.

Our previous process of installing internal and external developer tools
was scattered and inconsistent. Some tools were installed via `go
generate -tags tools ./tools/...`,
others via various `make` targets, and some only in Github Actions
workflows. This process led to some undesirable side effects:
  * The modules of some dev and test tools were included with those
    of the Vault project. This leads to us having to manage our own
    Go modules with those of external tools. Prior to Go 1.16 this
    was the recommended way to handle external tools, but now
    `go install tool@version` is the recommended way to handle
    external tools that need to be build from source as it supports
    specific versions but does not modify the go.mod.
  * Due to Github cache constraints we combine our build and test Go
    module caches together, but having our developer tools as deps in
    our module results in a larger cache which is downloaded on every
    build and test workflow runner. Removing the external tools that were
    included in our go.mod reduced the expanded module cache by size
    by ~300MB, thus saving time and network I/O costs when downloading
    the module cache.
  * Not all of our developer tools were included in our modules. Some were
    being installed with `go install` or `go run`, so they didn't take
    advantage of a single module cache. This resulted in us downloading
    Go modules on every CI and Build runner in order to build our
    external tools.
  * Building our developer tools from source in CI is slow. Where possible
    we can prefer to use pre-built binaries in CI workflows. No more
    module download or tool compiles if we can avoid them.

I've refactored how we define internal and external build tools
in our Makefile and added several new targets to handle both building
the developer tools locally for development and verifying that they are
available. This allows for an easy developer bootstrap while also
supporting installation of many of the external developer tools from
pre-build binaries in CI. This reduces our network IO and run time
across nearly all of our actions runners.

While working on this I caught and resolved a few unrelated issue:
* Both our Go and Proto format checks we're being run incorrectly. In
  CI they we're writing changes but not failing if changes were
  detected. The Go was less of a problem as we have git hooks that
  are intended to enforce formatting, however we drifted over time.
* Our Git hooks couldn't handle removing a Go file without failing. I
  moved the diff check into the new Go helper and updated it to handle
  removing files.
* I combined a few separate scripts and into helpers and added a few
  new capabilities.
* I refactored how we install Go modules to make it easier to download
  and tidy all of the projects go.mod's.
* Refactor our internal and external tool installation and verification
  into a tools.sh helper.
* Combined more complex Go verification into `scripts/go-helper.sh` and
  utilize it in the `Makefile` and git commit hooks.
* Add `Makefile` targets for executing our various tools.sh helpers.
* Update our existing `make` targets to use new tool targets.
* Normalize our various scripts and targets output to have a consistent
  output format.
* In CI, install many of our external dependencies as binaries wherever
  possible. When not possible we'll build them from scratch but not mess
  with the shared module cache.
* [QT-641] Remove our external build tools from our project Go modules.
* [QT-641] Remove extraneous `go list`'s from our `set-up-to` composite
  action.
* Fix formatting and regen our protos

Signed-off-by: Ryan Cragun <me@ryan.ec>
2024-01-09 17:50:46 +00:00
hashicorp-copywrite[bot]
0b12cdcfd1
[COMPLIANCE] License changes (#22290)
* Adding explicit MPL license for sub-package.

This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.

* Adding explicit MPL license for sub-package.

This directory and its subdirectories (packages) contain files licensed with the MPLv2 `LICENSE` file in this directory and are intentionally licensed separately from the BSL `LICENSE` file at the root of this repository.

* Updating the license from MPL to Business Source License.

Going forward, this project will be licensed under the Business Source License v1.1. Please see our blog post for more details at https://hashi.co/bsl-blog, FAQ at www.hashicorp.com/licensing-faq, and details of the license at www.hashicorp.com/bsl.

* add missing license headers

* Update copyright file headers to BUS-1.1

* Fix test that expected exact offset on hcl file

---------

Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
Co-authored-by: Sarah Thompson <sthompson@hashicorp.com>
Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
2023-08-10 18:14:03 -07:00
Ryan Cragun
8615b31598
ci: restore old timing before saving new cache (#22011)
* restore old timing results before saving new cache
* don't do an unecessary checkout in set-up-go

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-07-21 10:27:33 -06:00
Ryan Cragun
1a46088afb
[QT-590] Optimize the CI testing workflow (#21959)
We further optimize the CI workflow for better costs and speed.
We tested the Go CI workflows across several instance classes
and update our compute choices. We achieve an average execution
speed improvement of 2-2.5 minutes per test workflow while
reducing the infrastructure cost by about 20%. We also also save
another ~2 minutes by installing `gotestsum` from the Github release
instead of downloading the Go modules and compiling it every time.

In addition to the speed improvements, we also further reduced our cache
usage by updating the `security-scan` workflow to not cache Go modules.
We also use the `cache/save` and `cache/restore` actions for timing
caches. This results is saving half as many cache results for timing
data.

*UI test results*
results for 2x runs:
* c6a.2xlarge (12m54s, 11m55s)
* c6a.4xlarge (10m47s, 11m6s)
* c6a.8xlarge (11m32s, 10m51s)
* m5.2xlarge (15m23s, 14m16s)
* m5.4xlarge (14m48s, 12m54s)
* m5.8xlarge (12m27s, 12m24s)
* m6a.2xlarge (11m55s, 12m20s)
* m6a.4xlarge (10m54s, 10m43s)
* m6a.8xlarge (10m33s, 10m51s)

Current runner:
m5.2xlarge (15m23s, 14m16s, avg 14m50s) @ 0.448/hr = $0.11

Faster candidates
* c6a.2xlarge (12m54s, 11m55s, avg 12m24s) @ 0.3816/hr = $0.078
* m6a.2xlarge (11m55s, 12m20s, avg 12m8s) @ 0.4032/hr = $0.081
* c6a.4xlarge (10m47s, 11m6s, avg 10m56s) @ 0.7632/hr = $0.139
* m6a.4xlarge (10m54s, 10m43s, avg 10m48s) @ 0.8064/hr = $0.140

Best bang for the buck for test-ui:
  m6a.2xlarge, > 25% cost savings from current and we save ~2.5 minutes.

*Go test results*
During testing the external replication tests, when not broken up, will
always take the longest. Our original analysis focuses on this job.
Most other tests groups will finish ~3m faster so we'll use subtract
that time when estimating the cost for the whole job.

external replication job results:
* c6a.2xlarge (20m49s, 19m20s, avg 20m5s)
* c6a.4xlarge (19m1s, 19m38s, avg 19m20s)
* c6a.8xlarge (19m51s, 18m54s, avg 19m23s)
* m5.2xlarge (22m12s, 20m29s, avg 21m20s)
* m5.4xlarge (20m7s, 19m3s, avg 20m35s)
* m5.8xlarge (20m24s, 19m42s, avg 20m3s)
* m6a.2xlarge (21m10s, 19m37s, avg 20m23s)
* m6a.4xlarge (18m58s, 19m51s, avg 19m24s)
* m6a.8xlarge (19m27s, 18m47s, avg 19m7s)

There is little separation in time when we increase class size. In the
best case a class size increase yields about a ~5% performance increase
and doubles the cost. For test-go our best bang for the buck is
certainly going to be in the 2xlarge class.

Current runner:
m5.2xlarge (22m12s, 20m29s, avg 21m20s) @ 0.448/hr (16@avg-3m + 1@avg) = $2.35

Candidates in the same class
* c6a.2xlarge (20m49s, 19m20s, avg 20m5s) @ 0.3816/hr (16@avg-3m + 1@avg) = $1.86
* m6a.2xlarge (21m10s, 19m37s, avg 20m23s) @ 0.4032/hr (16@avg-3m + 1@avg) = $2.00

Best bang for the buck for test-go:
  c6a.2xlarge: 20% cost savings and save about ~2.25 minutes.

We ran the tests with similar instances and saw similar execution times as
with test-go. Therefore we can use the same recommended instance sizes.

After breaking up test-go's external replication tests, the longest group
was shorter on average. I choose to look at group 3 as it was usually the
longest grouping:

* c6a.2xlarge: (14m51s, 14m48s)
* c6a.4xlarge: (14m14s, 14m15)
* c6a.8xlarge: (14m0s, 13m54s)
* m5.2xlarge: (15m36s, 15m35s)
* m5.4xlarge: (14m46s, 14m49s)
* m5.8xlarge: (14m25s, 14m25s)
* m6a.2xlarge: 14m51s, 14m53s)
* m6a.4xlarge: 14m16s, 14m16s)
* m6a.8xlarge: (14m2s, 13m57s)

Again, we see ~5% performance gains between the 2x and 8x instance classes
at quadruple the cost. The c6a and m6a families are almost identical, with
the c6a class being cheaper.

*Notes*
* UI and Go Test timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5556957460/jobs/10150759959
* Go Test with data race detection timing results: https://github.com/hashicorp/vault-enterprise/actions/runs/5558013192
* Go Test with replication broken up: https://github.com/hashicorp/vault-enterprise/actions/runs/5558490899

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-07-20 14:10:08 -06:00
Ryan Cragun
a98c0d9cbe
actions: always cache all required Go modules (#21792)
* Make sure that we always download all of the required modules.
* Fix actions/set-up-go path for UI test
* Fix broken go.mod in hcp_link

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-07-12 20:21:09 +00:00
Ryan Cragun
c43345c452
[QT-589] Use the go module cache between CI and build (#21764)
In order to reliably store Go test times in the Github Actions cache we
need to reduce our cache thrashing by not using more than 10gb over all
of our caches. This change reduces our cache usage significantly by
sharing Go module cache between our Go CI workflows and our build
workflows. We lose our per-builder cache which will result in a bit of
performance hit, but we'll enable better automatic rebalancing of our CI
workflows. Overall we should see a per branch reduction in cache sizes
from ~17gb to ~850mb.

Some preliminary investigation into this new strategy:

Prior build workflow strategy on a cache miss:
  Download modules: ~20s
  Build Vault: ~40s
  Upload cache: ~30s
  Total: ~1m30s

Prior build workflow strategy on a cache hit:
  Download and decompress modules and build cache: ~12s
  Build Vault: ~15s
  Total: ~28s

New build workflow strategy on a cache miss:
  Download modules: ~20
  Build Vault: ~40s
  Upload cache: ~6s
  Total: ~1m6s

New build workflow strategy on a cache hit:
  Download and decompress modules: ~3s
  Build Vault: ~40s
  Total: ~43s

Expected time if we used no Go caching:
  Download modules: ~20
  Build Vault: ~40s
  Total: ~1m

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-07-12 17:55:16 +00:00