146 Commits

Author SHA1 Message Date
Kai Lueke
8600ddb540 ci-automation/release: Set up secret envs 2022-09-22 18:45:49 +02:00
Kai Lueke
b2f19b2d4c ci-automation/release: Run plume release only once
We need to run plume only once for each arch, move it out of the loop.
Also, address some smaller things that shellcheck complains about.
2022-09-22 18:45:49 +02:00
Kai Lueke
760754e86f ci-automation/secret_to_file: Fix usage from subshell
This failed when used from ( secret_to_file ... VAR ; cat $VAR )
because ( ) starts a new subshell PID and secret_to_file's returned
/proc/PID/fd/X path was then using the wrong PID.
2022-09-22 18:45:49 +02:00
Kai Lueke
6b0db859ce ci-automation/release: Disable GCS auth for plume pre-release
When GCS auth is expected, plume would upload the AMI list to GCS.
2022-09-22 18:45:49 +02:00
Mathieu Tortuyaux
6f529aa0ac release: get product IDs from Jenkins
the JSON object is passed from the Groovy script to the release script,
we just need to extract the correct AWS Marketplace product ID based on
the "<channel>-<arch>".

Exception for the stable-amd64 where we also need to get the stable-pro
product ID.

Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
2022-09-22 18:45:49 +02:00
Mathieu Tortuyaux
6df8555d1c sdk_container: publish the SDK on a Docker registry
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
2022-09-22 18:45:49 +02:00
Kai Lueke
869b0302b8 ci-automation/release.sh: Run plume to release cloud images
The mantle plume tool has two steps, pre-release is the mere upload and
release is the publication. In the past this was used to run the tests
inbetween but we don't do this anymore.
Run plume pre-release and release in a single job. Since plume can't
push to GCS in our case, we upload the files to bincache. Also do the
cloudformation update which was previously done in
flatcar-build-scripts but could only be run after the sync to Origin.
It requires the "aws" tool in the mantle container until we implement
this in plume directly.
2022-09-22 18:45:49 +02:00
Krzesimir Nowak
54b7bbd671 ci-automation: Implement a stricter image version check
I made a mistake and wrote a version like main-3363-0.0-stuff (note a
dash instead of a dot after the first number). Surprisingly the build
chugged along just fine almost until the end of the image job - it
detected invalid version string when the job wanted to create a
version.txt file:

ERROR   build_image: script called: build_image '--board=amd64-usr' '--group=developer' '--output_root=/home/sdk/build/images' '--only_store_compressed' '--torcx_root=/home/sdk/build/torcx' 'prodtar' 'container'
ERROR   build_image: Backtrace:  (most recent call is last)
ERROR   build_image:   file build_image, line 196, called: split_ver '3363' 'SPLIT'
ERROR   build_image:   file common.sh, line 192, called: die 'Invalid version string '3363''
ERROR   build_image:
ERROR   build_image: Error was:
ERROR   build_image:   Invalid version string '3363'

Let's have a stricter version check in the beginning of the build
process, so the process fails sooner rather than later.
2022-09-19 15:04:22 +02:00
Kai Lueke
29e1e94522 Use new github org name "flatcar"
The "flatcar-linux" github org was renamed to "flatcar". There are no
github redirections in place and we have to update all links.
2022-09-14 14:52:47 +02:00
Kai Lueke
62be70b658 Use ghcr.io/flatcar, there are no redirects
The GitHub org rename also moved the ghcr.io container image repo but
in contrast to git repos, there are no redirects!
2022-09-14 14:52:47 +02:00
Krzesimir Nowak
957c82b142 ci-automation: Change the way we prepare torcx manifest for testing
Now URLs for torcx packages are always present in the torcx manifest,
but for releases they may be pointing to the origin server where the
packages will be eventually uploaded. At the time of running the
tests, those packages are still only in the build cache, so change the
URLs to point to the build cache, so the test can pass.
2022-09-07 15:14:06 +02:00
Krzesimir Nowak
0b774f3fa8 *: Allow specifying extra URLs for torcx packages
Torcx manifest may contain paths and URLs as locations of
packages. There are two kinds of packages - vendored and
extra. Vendored packages normally have two locations - path to the
directory inside the image where the package is (which is why it's
called vendored), and a URL to the package on some remote
server. Extra packages only have a URL. But the URLs are added only
when we tell the build_torcx_store script to upload the packages at
the same time, which is what the old build pipeline was doing. With
the new pipeline, the upload happens as a separate step, thus the
upload is disabled when invoking build_torcx_store, and so the
packages are not getting URLs set. This change went unnoticed, because
a kola test checking the generated torcx manifest was only checking if
there is at least one location, either path or URL, and all the new
releases have no extra packages, only vendored ones.

When backporting the new pipeline to old LTS, the kola tests started
to fail, because old LTS had one extra package, and this is how I
noticed the problem.
2022-09-07 15:14:06 +02:00
Kai Lueke
45ec29a981 ci-automation: Prepare release job
The old pipeline had a release job where mantle's plume release tool
was invoked to publish the cloud images.
Implement a release job in the new pipeline with the same goals and
eventually even more automation.
2022-09-05 16:10:42 +02:00
Kai Lueke
4f3fd3889c ci-automation: Move image change report to own file
To review the image changes and the changelog more easily and in case
of fixes, iterate over it without rebuilding the image, move this logic
to its own file where a new job could call it.
2022-09-05 16:10:42 +02:00
Kai Lueke
96fc103372 Cover Equinix Metal m3.small.x86 instances in release test
The new m3.small instance does not have official Flatcar support yet
but we can already cover it in our PXE boot release tests.
The c3.small instances are legacy and m3.small is the new smallest
type.
2022-09-01 13:35:36 +02:00
Krzesimir Nowak
22cd00f1fc ci-automation: Use an array for storing failed tests 2022-08-31 12:11:37 +02:00
Krzesimir Nowak
084f6e28ca ci-automation: Print failed tests nicer
Instead of printing failed tests like this:

    Failed tests: kubeadm.v1.25.0.cilium.base
    kubeadm.v1.24.1.cilium.base

Do it like this:

    Failed tests:
    kubeadm.v1.25.0.cilium.base
    kubeadm.v1.24.1.cilium.base
2022-08-31 12:11:37 +02:00
Krzesimir Nowak
841fe33020 ci-automation: Return 1 on broken cycle
We have set success to true when the test cycle was broken, which was
a hacky way to avoid printing the give up message. But this setting
success to true also meant that the script returned with status 0,
which is wrong.

Add another variable for controlling printing the give up message.
2022-08-31 12:11:37 +02:00
Krzesimir Nowak
731cab027b ci-automation: Break test cycle properly
Create a tapfile and break out of the loop.
2022-08-31 12:11:36 +02:00
Krzesimir Nowak
d17656431b ci-automation: Break retest cycle properly in qemu on arm64
Rerunning the test will always yield the same result in this case, so
it's pointless.
2022-08-25 10:07:53 +02:00
Krzesimir Nowak
8c278422e6 ci-automation/packages.sh: Fix access to unbound variable
We were running the run_sdk_container script with passing a value of a
variable named version to the script through the -v flag. But nowhere
is the variable defined. This worked under jenkins, because jenkins
job has a version parameter that gets exported into environment under
the same name. But running it manually outside jenkins revealed the
bug.

The script should have been using a vernum variable. Now, the
difference between this variable and the version variable is that
"version" was in form of <channel>-<version>-<build_id>, whereas
"vernum" comes without the channel part. Fortunately,
"run_sdk_container" was stripping the channel part before using this
value, so it makes no difference whether we pass
main-3333.0.0.0-some-id or just 3333.0.0-some-id.
2022-08-25 10:07:53 +02:00
Krzesimir Nowak
cff99646ac ci-automation: Sync used EquinixMetal region to use for ARM64 servers
Recently we changed the region from DA (Dallas) to DC (Washington),
because there are more ARM64 servers available. Reflect this change in
the new pipeline too.
2022-08-05 13:23:02 +02:00
Krzesimir Nowak
ea98e8e3bc ci-automation/vendor-testing/azure.sh: Use an array for extra instance types 2022-08-05 13:22:56 +02:00
Krzesimir Nowak
27abebb190 ci-automation/vendor-testing/azure.sh: Use proper machine size on arm64 2022-08-05 13:22:56 +02:00
Krzesimir Nowak
f34f52706d ci-automation/vendor-testing/azure.sh: Fix unbound variable use
This gets triggered when the test is rerun and an existing image is
reused.
2022-08-05 13:22:56 +02:00
Krzesimir Nowak
9fa0f0f8f9 ci-automation/vendor-testing/azure.sh: Fix hyperv generation argument
The "v" must be a capital letter. It seems that Azure got picker about
parameters it accepts.
2022-08-05 13:22:56 +02:00
Kai Lueke
11e25ab0c9 ci-automation: Move git tagging into own script
When the build system runs the packages jobs for both architectures in
parallel and has to create a new tag, tagging fails due to the race in
the tagging.
Move the git tagging to its own script that is run from a new top-level
job that starts the packages jobs for both architectures.
2022-07-19 19:35:43 +02:00
Krzesimir Nowak
6d9526be84 ci-automation: Generate digests files for the built artifacts 2022-07-14 16:55:00 +02:00
Krzesimir Nowak
f618600f63 ci-automation: Add a function for generating digests
It works in a similar way to sign_artifacts - it takes a signer, a
list of files and directories, and generates digests next to the
respective files.
2022-07-14 16:55:00 +02:00
Krzesimir Nowak
ee7d5ed044 ci-automation: Factor out listing files into a separate function
This will come in handy when listing files for creating digests files.
2022-07-14 16:55:00 +02:00
Kai Lueke
ce588b9d01 ci-automation: Show changes by finding the previous channel
The image comparison was done against the old release in the channel
we release to instead of the previous release with the same major
version. This means when a channel transition happens we see a large
diff instead of the diff against the previous release. While not bad
for finding problems, this is normally not needed. However, we want
to have two changelogs generated, one against the old release in the
channel we relese to and one against the previous release with the same
major version when a transition happens. There was no changelog
printing yet, and this is added now.
2022-07-14 13:45:04 +02:00
Gabriel Adrian Samfira
479fe8cefc Disable image mangle in qemu tests
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-07-13 18:16:12 +02:00
Gabriel Adrian Samfira
2593063eea Make QEMU_UEFI_BIOS configurable
Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-07-13 18:16:12 +02:00
Gabriel Adrian Samfira
c869260f82 Add configuration options to test functions
* Add SKIP_COPY_TO_BINCACHE environment variable that will skip
    uploading test results to bincache. This is useful if we want to
    upload test results as artifacts on github.
  * make QEMU_IMAGE_NAME configurable

Signed-off-by: Gabriel Adrian Samfira <gsamfira@cloudbasesolutions.com>
2022-07-13 18:16:12 +02:00
Gabriel Adrian Samfira
19ac76082b Add CI workflow 2022-07-13 18:12:23 +02:00
Kai Lueke
7e119be27a ci-automation: Only store compressed images
The new build pipeline compresses images already but uploaded both the
compressed and uncompressed files because the whole build folder gets
uploaded.
Add a new flag "--only_store_compressed" to the image generation which
deletes the uncompressed file after compression is done. Uncompressed
images are still supported if specified in the flag
"image_compression_formats".

Closes https://github.com/flatcar-linux/Flatcar/issues/793
2022-07-06 15:56:31 +02:00
Kai Lueke
8dffd734a6 ci-automation: Run package-diff to report image changes
The original pipeline has package-diff commands to print out image
differences compared to the last release. This is used for the release
Go/No-Go QA checks.
Add the same logic to the new pipeline.
2022-06-29 15:37:32 +02:00
Kai Lueke
a97dcfacee ci-automation: Use the package container for VM image building
The image job builds an image container that is multiple GBs big and
takes >10 mins to be loaded in the vms job. The vms job can also do its
work by running from the packages container from the packages job when
it fetchs the built image from bincache first and assuming the images
job copies it there.
Skip generating the image container and instead use the packages
container for VM image building by copying the image folder first to
bincache and then retrieving it from there. While reworking this we
also address the issue that the VMs container had used the same name
for both architectures, causing a race when both run in parallel on
the same worker.
2022-06-29 15:37:32 +02:00
Kai Lueke
44f62d5bc1 ci-automation: align VM image compression with existing pipeline
In jenkins/vms.sh the Digital Ocean and OpenStack images get also
compressed as gzip.
Do so for the new pipeline, too.
2022-06-29 11:41:10 +02:00
Krzesimir Nowak
c3e5e754e9
Merge pull request #334 from flatcar-linux/krnowak/sign-images
ci-automation: Sign the artifacts
2022-06-03 17:29:13 +02:00
Krzesimir Nowak
527bd2237b ci-automation: Sign artifacts and upload the signatures
It uses the SIGNER environment variable to decide whether the
signatures should be created or not. It expect the key of the SIGNER
to exist in GPGHOME, and that's what gpg_setup.sh is already doing.

In some places we need to recursively change the owner of the
directory that contains artifacts to be signed, otherwise we won't be
able to create new files with signatures there. This is because some
of the artifacts are either created inside the SDK container (so the
created files belong to root outside the container) or are created
with `sudo`.
2022-06-03 14:59:38 +02:00
Krzesimir Nowak
0e0eb67ca2 ci-automation: Set up keys for signing
Not used for anything yet. This sets up a temporary GPGHOME directory
and a trap that will remove it after we are done.
2022-06-03 14:59:26 +02:00
Krzesimir Nowak
090d7ec176 ci-automation: Run functions in subshells
The functions are sourcing other files that define global variables,
so they will spill into the callers shell unnecessarily. We will also
add some functionality that uses traps in follow-up commits, so it's
good to limit the scope of traps too.
2022-06-03 14:58:29 +02:00
Krzesimir Nowak
698d0de129 ci-automation: Trivial fixes
Dropped some trailing whitespace, fixed a typo. Trivial.
2022-06-03 14:56:51 +02:00
Krzesimir Nowak
cec96aeec5 ci-automation/vendor-testing/azure.sh: Small fixes
Fix some comments, quote some variables.
2022-06-02 18:40:06 +02:00
Kai Lueke
6c0fb8959d ci-automation/vendor-testing/azure.sh: Align timeout with GC duration
The kola test run time shouldn't be longer than the GC duration to
prevent failing tests caused by GC interference.
Align the Azure kola timeout with the GC duration.
2022-06-02 14:08:28 +09:00
Sayan Chowdhury
42608d3c67
ci-automation/azure: Add initial container tests infra for Azure (#274)
The Azure tests use a similar logic as the GCE tests where an the
instance type parameter normally used in AWS/Equinix Metal tests is
here used to specify whether the VM gets started in Gen V1 or V2 mode.

Signed-off-by: Sayan Chowdhury <schowdhury@microsoft.com>
Co-authored-by: Kai Lüke <pothos@users.noreply.github.com>
2022-05-27 08:01:59 +02:00
Kai Lueke
cee8a6aadf ci-automation: Push version file early
When a nightly build is started that pushes the version file to the
branch it was doing so only at the end of the build, causing the push
to fail if something else got merged in between.
Push the version file early by generating it the same way it would be
generated by the run_sdk_container/bootstrap_sdk_container scripts.
In the case of the SDK the version file gets the same version for the
OS and the SDK. Add some explanations about the version formats. Note
that the scripts will still rewrite the file but it should be a no-op.
2022-05-23 22:40:02 +09:00
Kai Lueke
95367851fa ci-automation/sdk_bootstrap.sh: Allow omitting the optional parameters
The coreos/portage refs were allowed to be empty strings but the way
the function was run from Groovy the lack of quoting caused the empty
strings to be missing parameters.
Since the two parameters are meant to be optional, support omitting
them.
2022-05-23 19:29:39 +09:00
Krzesimir Nowak
8c3d7b977b ci-automation: Fix potential use of unbound variable error
`local -a stuff` does not make `stuff` a bound array variable, so
checking length of the array will trigger an error about unbound
variable. Fortunately, `local stuff=()` does the trick.
2022-05-11 12:43:08 +02:00