The old pipeline had a release job where mantle's plume release tool
was invoked to publish the cloud images.
Implement a release job in the new pipeline with the same goals and
eventually even more automation.
To review the image changes and the changelog more easily and in case
of fixes, iterate over it without rebuilding the image, move this logic
to its own file where a new job could call it.
Instead of printing failed tests like this:
Failed tests: kubeadm.v1.25.0.cilium.base
kubeadm.v1.24.1.cilium.base
Do it like this:
Failed tests:
kubeadm.v1.25.0.cilium.base
kubeadm.v1.24.1.cilium.base
We have set success to true when the test cycle was broken, which was
a hacky way to avoid printing the give up message. But this setting
success to true also meant that the script returned with status 0,
which is wrong.
Add another variable for controlling printing the give up message.
The new m3.small instance does not have official Flatcar support yet
but we can already cover it in our PXE boot release tests.
The c3.small instances are legacy and m3.small is the new smallest
type.
We were running the run_sdk_container script with passing a value of a
variable named version to the script through the -v flag. But nowhere
is the variable defined. This worked under jenkins, because jenkins
job has a version parameter that gets exported into environment under
the same name. But running it manually outside jenkins revealed the
bug.
The script should have been using a vernum variable. Now, the
difference between this variable and the version variable is that
"version" was in form of <channel>-<version>-<build_id>, whereas
"vernum" comes without the channel part. Fortunately,
"run_sdk_container" was stripping the channel part before using this
value, so it makes no difference whether we pass
main-3333.0.0.0-some-id or just 3333.0.0-some-id.
Recently we changed the region from DA (Dallas) to DC (Washington),
because there are more ARM64 servers available. Reflect this change in
the new pipeline too.
When the build system runs the packages jobs for both architectures in
parallel and has to create a new tag, tagging fails due to the race in
the tagging.
Move the git tagging to its own script that is run from a new top-level
job that starts the packages jobs for both architectures.
The image comparison was done against the old release in the channel
we release to instead of the previous release with the same major
version. This means when a channel transition happens we see a large
diff instead of the diff against the previous release. While not bad
for finding problems, this is normally not needed. However, we want
to have two changelogs generated, one against the old release in the
channel we relese to and one against the previous release with the same
major version when a transition happens. There was no changelog
printing yet, and this is added now.
* Add SKIP_COPY_TO_BINCACHE environment variable that will skip
uploading test results to bincache. This is useful if we want to
upload test results as artifacts on github.
* make QEMU_IMAGE_NAME configurable
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
The new build pipeline compresses images already but uploaded both the
compressed and uncompressed files because the whole build folder gets
uploaded.
Add a new flag "--only_store_compressed" to the image generation which
deletes the uncompressed file after compression is done. Uncompressed
images are still supported if specified in the flag
"image_compression_formats".
Closes https://github.com/flatcar-linux/Flatcar/issues/793
The original pipeline has package-diff commands to print out image
differences compared to the last release. This is used for the release
Go/No-Go QA checks.
Add the same logic to the new pipeline.
The image job builds an image container that is multiple GBs big and
takes >10 mins to be loaded in the vms job. The vms job can also do its
work by running from the packages container from the packages job when
it fetchs the built image from bincache first and assuming the images
job copies it there.
Skip generating the image container and instead use the packages
container for VM image building by copying the image folder first to
bincache and then retrieving it from there. While reworking this we
also address the issue that the VMs container had used the same name
for both architectures, causing a race when both run in parallel on
the same worker.
It uses the SIGNER environment variable to decide whether the
signatures should be created or not. It expect the key of the SIGNER
to exist in GPGHOME, and that's what gpg_setup.sh is already doing.
In some places we need to recursively change the owner of the
directory that contains artifacts to be signed, otherwise we won't be
able to create new files with signatures there. This is because some
of the artifacts are either created inside the SDK container (so the
created files belong to root outside the container) or are created
with `sudo`.
The functions are sourcing other files that define global variables,
so they will spill into the callers shell unnecessarily. We will also
add some functionality that uses traps in follow-up commits, so it's
good to limit the scope of traps too.
The kola test run time shouldn't be longer than the GC duration to
prevent failing tests caused by GC interference.
Align the Azure kola timeout with the GC duration.
The Azure tests use a similar logic as the GCE tests where an the
instance type parameter normally used in AWS/Equinix Metal tests is
here used to specify whether the VM gets started in Gen V1 or V2 mode.
Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>
Co-authored-by: Kai Lüke <pothos@users.noreply.github.com>
When a nightly build is started that pushes the version file to the
branch it was doing so only at the end of the build, causing the push
to fail if something else got merged in between.
Push the version file early by generating it the same way it would be
generated by the run_sdk_container/bootstrap_sdk_container scripts.
In the case of the SDK the version file gets the same version for the
OS and the SDK. Add some explanations about the version formats. Note
that the scripts will still rewrite the file but it should be a no-op.
The coreos/portage refs were allowed to be empty strings but the way
the function was run from Groovy the lack of quoting caused the empty
strings to be missing parameters.
Since the two parameters are meant to be optional, support omitting
them.
`local -a stuff` does not make `stuff` a bound array variable, so
checking length of the array will trigger an error about unbound
variable. Fortunately, `local stuff=()` does the trick.
We forgot to clear the array with instance tests to rerun, so the list
grew from one iteration to another when going over all the instance
types. I did not spot it before, because I tested it with only one
extra instance.
The logic we had in some tests for covering different instance types
now got more easy to reuse for testing the GVNIC mode in GCE.
Align the GCE test with AWS and DigitalOcean to test an additional
"instance type" (here just changing the NIC) and break the retest spin
case it gets called for arm64.
The test framework from the AWS PR allows us to align the logic which
also addresses some bugs we had here.
Port the Equinix Metal test over to the new framework (and also use
different test basenames per architecture while at it which could
otherwise result in clashes).