Commit Graph

168 Commits

Author SHA1 Message Date
Kai Lueke
bc3a9aeacd qemu_update: Add update test from an old release
To ensure that we can update from very old releases, add a test with a
fixed old release, here the Stable release that introduced arm64
support to have the same test logic for both architectures.
2022-11-29 16:51:27 +01:00
Krzesimir Nowak
fbb962c7f6 ci-automation: Add an environment variable to skip build shortcuts
This will be used for the "run all tests" days in Jenkins.
2022-11-03 12:00:10 +01:00
Kai Lueke
3cb9736c33 ci-automation: Use plain AMI image for uploads
Recently we ran into sporadic corruption issues for AWS EC2 AMIs.
We use the streamOptimized VMDK format and it seems to cause problems
at the AWS side, regardless if created by qemu-img or vmdk-convert.
Switch to using the plain AMI images for uploading as workaround.
2022-10-28 17:21:39 +02:00
Kai Lueke
25dbccc14d ci-automation: Support local patches
For embargoed releases it is useful to apply patches locally to build
with them before they are public. This allows to push the same patches
to the repo during the Flatcar release at the embargo lift. The result
is the same (as long as the scripts patches did not change parts of the
setup logic that was running before they got applied), we can just build
earlier and thus do the Flatcar release directly on the embargo lift
instead of having to wait with the build because it would require the
patches to be in the repos.
2022-10-27 11:53:33 +02:00
Krzesimir Nowak
06d2aabaa2 ci-automation/vendor-testing/vmware.sh: Fix unbound variable use
This gets triggered when the test is rerun and an existing image is
reused.
2022-10-11 15:25:56 +02:00
Jeremi Piotrowski
de132c62d5
Merge pull request #521 from flatcar/jepio/gpg-import-batch
ci-automation: use --batch when importing gpg key
2022-10-06 09:52:07 +02:00
Kai Lueke
00223be1c7 ci-automation/release: Only upload SDK if a new one was built
A release includes an SDK if its SDK version is the release version.
Only then we need to upload a new SDK container image.
2022-10-04 14:24:28 +02:00
Jeremi Piotrowski
6e11ae3394 ci-automation: use --batch when importing gpg key
All invocations of gpg in ci-automation pass --batch as an argument except the
import. Be consistent by having it included everywhere. Additionally, since
ci-automation runs wrapped in a systemd service, no tty is available so batch
is needed for correctness.
2022-10-04 10:22:43 +02:00
Mathieu Tortuyaux
289cc52c5f
automation/gc: add openstack garbage collector
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
2022-09-29 11:21:25 +02:00
Mathieu Tortuyaux
de8b4eae6a
ci-automation: add openstack to tested vendors
Missing link to enable the tests in the Flatcar test suite.

Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
2022-09-29 11:21:25 +02:00
Kai Lueke
89495373d9 ci-automation: Ensure to use latest container image
The container image was only created if it didn't exist locally. This
would result in fixes not being in a downstream job that is scheduled
to a different worker node on Jenkins that has a stale copy.
For the build automation we will now always download the latest
container tar ball based on comparing the image ID from a new artifact,
and for registry images we pull the container image to make sure that
we don't use a stale copy when we rebuild.
2022-09-29 10:04:23 +02:00
Kai Lüke
dca21df916
Merge pull request #513 from flatcar/kai/container-fallback
ci-automation: Fallback also to the mirror for container download
2022-09-27 17:49:53 +02:00
Kai Lueke
20643b260e ci-automation: Fallback also to the mirror for container download
When there is no SDK container image in the registry, the fallback
looks at bincache but bincache isn't backed up and may be cleaned of
old releases. While this won't be the regular case, the container
image registry may be unavailable (or renamed as happened now), or
people would like to rerun the image job which relies on the packages
container.
2022-09-27 15:53:33 +02:00
Krzesimir Nowak
24213a5c96 ci-automation: Download correct previous image for LTS release
qemu_update vendor test was downloading a wrong LTS image when it was
testing the old LTS image. This is because it was using a current
symlink, which for LTS channel will always point to the new LTS. Old
LTS is available under current-${YEAR} symlink. We can get the
information about year from the lts-info file.
2022-09-27 11:56:39 +02:00
Krzesimir Nowak
2606380396 ci-automation: Fix unbound variable errors
FLATCAR_VERSION and FLATCAR_SDK_VERSION are defined in the version
file, so it should be sourced before trying to use those. Here we try
to do it in a limited scope.

Also, SDK container link should use the dockerized version in a
directory name.
2022-09-27 10:55:08 +02:00
Kai Lueke
326c645647 ci-automation: Fix syntax error 2022-09-26 17:24:53 +02:00
Kai Lueke
bca6e6e41d ci-automation: Don't skip nightly build when the previous one failed
Currently we skip the nightly build if there are no changes. This
didn't work well because a new run doesn't fix any failure because the
rerun became a no-op.
Check if the main artifacts we expect from a step are found, as simple
heuristic on whether a rerun is needed.
2022-09-26 17:06:21 +02:00
Kai Lueke
18627499c1 Annotate a copied function
I found a duplicate function and verified that it's the only one via
comm -12 <(sort ci-automation/ci_automation_common.sh) <(sort sdk_lib/sdk_container_common.sh) | grep function
I'm not sure if this is due to a case where we only import one but
can't import the other, hence I'm not deleting it now.
2022-09-26 15:39:45 +02:00
Kai Lueke
3fef1eb801 ci-automation/release: Set up secret envs 2022-09-22 18:31:50 +02:00
Kai Lueke
ffee812d32 ci-automation/release: Run plume release only once
We need to run plume only once for each arch, move it out of the loop.
Also, address some smaller things that shellcheck complains about.
2022-09-22 18:31:50 +02:00
Kai Lueke
79d89faf91 ci-automation/secret_to_file: Fix usage from subshell
This failed when used from ( secret_to_file ... VAR ; cat $VAR )
because ( ) starts a new subshell PID and secret_to_file's returned
/proc/PID/fd/X path was then using the wrong PID.
2022-09-22 18:31:50 +02:00
Kai Lueke
ef8f20f9dd ci-automation/release: Disable GCS auth for plume pre-release
When GCS auth is expected, plume would upload the AMI list to GCS.
2022-09-22 18:31:50 +02:00
Mathieu Tortuyaux
593cf19a7a release: get product IDs from Jenkins
the JSON object is passed from the Groovy script to the release script,
we just need to extract the correct AWS Marketplace product ID based on
the "<channel>-<arch>".

Exception for the stable-amd64 where we also need to get the stable-pro
product ID.

Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
2022-09-22 18:31:50 +02:00
Mathieu Tortuyaux
27b62deb81 sdk_container: publish the SDK on a Docker registry
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
2022-09-22 18:31:50 +02:00
Kai Lueke
20ed1ad3a4 ci-automation/release.sh: Run plume to release cloud images
The mantle plume tool has two steps, pre-release is the mere upload and
release is the publication. In the past this was used to run the tests
inbetween but we don't do this anymore.
Run plume pre-release and release in a single job. Since plume can't
push to GCS in our case, we upload the files to bincache. Also do the
cloudformation update which was previously done in
flatcar-build-scripts but could only be run after the sync to Origin.
It requires the "aws" tool in the mantle container until we implement
this in plume directly.
2022-09-22 18:31:48 +02:00
Krzesimir Nowak
1585ede78a ci-automation: Implement a stricter image version check
I made a mistake and wrote a version like main-3363-0.0-stuff (note a
dash instead of a dot after the first number). Surprisingly the build
chugged along just fine almost until the end of the image job - it
detected invalid version string when the job wanted to create a
version.txt file:

ERROR   build_image: script called: build_image '--board=amd64-usr' '--group=developer' '--output_root=/home/sdk/build/images' '--only_store_compressed' '--torcx_root=/home/sdk/build/torcx' 'prodtar' 'container'
ERROR   build_image: Backtrace:  (most recent call is last)
ERROR   build_image:   file build_image, line 196, called: split_ver '3363' 'SPLIT'
ERROR   build_image:   file common.sh, line 192, called: die 'Invalid version string '3363''
ERROR   build_image:
ERROR   build_image: Error was:
ERROR   build_image:   Invalid version string '3363'

Let's have a stricter version check in the beginning of the build
process, so the process fails sooner rather than later.
2022-09-19 12:12:37 +02:00
Kai Lueke
91a26e5e1e Use new github org name "flatcar"
The "flatcar-linux" github org was renamed to "flatcar". There are no
github redirections in place and we have to update all links.
2022-09-14 14:33:27 +02:00
Kai Lueke
edba76c012 Use ghcr.io/flatcar, there are no redirects
The GitHub org rename also moved the ghcr.io container image repo but
in contrast to git repos, there are no redirects!
2022-09-14 14:33:24 +02:00
Krzesimir Nowak
1ecea3544f ci-automation: Change the way we prepare torcx manifest for testing
Now URLs for torcx packages are always present in the torcx manifest,
but for releases they may be pointing to the origin server where the
packages will be eventually uploaded. At the time of running the
tests, those packages are still only in the build cache, so change the
URLs to point to the build cache, so the test can pass.
2022-09-06 14:00:50 +02:00
Krzesimir Nowak
b2d6f7fc6e *: Allow specifying extra URLs for torcx packages
Torcx manifest may contain paths and URLs as locations of
packages. There are two kinds of packages - vendored and
extra. Vendored packages normally have two locations - path to the
directory inside the image where the package is (which is why it's
called vendored), and a URL to the package on some remote
server. Extra packages only have a URL. But the URLs are added only
when we tell the build_torcx_store script to upload the packages at
the same time, which is what the old build pipeline was doing. With
the new pipeline, the upload happens as a separate step, thus the
upload is disabled when invoking build_torcx_store, and so the
packages are not getting URLs set. This change went unnoticed, because
a kola test checking the generated torcx manifest was only checking if
there is at least one location, either path or URL, and all the new
releases have no extra packages, only vendored ones.

When backporting the new pipeline to old LTS, the kola tests started
to fail, because old LTS had one extra package, and this is how I
noticed the problem.
2022-09-06 14:00:50 +02:00
Kai Lueke
b30654ef22 ci-automation: Prepare release job
The old pipeline had a release job where mantle's plume release tool
was invoked to publish the cloud images.
Implement a release job in the new pipeline with the same goals and
eventually even more automation.
2022-09-05 11:41:41 +02:00
Kai Lueke
1319e4c95a ci-automation: Move image change report to own file
To review the image changes and the changelog more easily and in case
of fixes, iterate over it without rebuilding the image, move this logic
to its own file where a new job could call it.
2022-09-05 11:41:41 +02:00
Kai Lüke
7b7c3e5b76
Merge pull request #425 from flatcar-linux/kai/em-m3
Cover Equinix Metal m3.small.x86 instances in release test
2022-09-01 13:34:20 +02:00
Krzesimir Nowak
8b52a9b04c ci-automation: Use an array for storing failed tests 2022-08-31 09:37:18 +02:00
Krzesimir Nowak
8cd06230ba ci-automation: Print failed tests nicer
Instead of printing failed tests like this:

    Failed tests: kubeadm.v1.25.0.cilium.base
    kubeadm.v1.24.1.cilium.base

Do it like this:

    Failed tests:
    kubeadm.v1.25.0.cilium.base
    kubeadm.v1.24.1.cilium.base
2022-08-31 09:37:18 +02:00
Krzesimir Nowak
9e05a07a77 ci-automation: Return 1 on broken cycle
We have set success to true when the test cycle was broken, which was
a hacky way to avoid printing the give up message. But this setting
success to true also meant that the script returned with status 0,
which is wrong.

Add another variable for controlling printing the give up message.
2022-08-31 09:37:18 +02:00
Krzesimir Nowak
6c77ebde54 ci-automation: Break test cycle properly
Create a tapfile and break out of the loop.
2022-08-31 09:37:18 +02:00
Kai Lueke
b8133d92a0 Cover Equinix Metal m3.small.x86 instances in release test
The new m3.small instance does not have official Flatcar support yet
but we can already cover it in our PXE boot release tests.
The c3.small instances are legacy and m3.small is the new smallest
type.
2022-08-24 18:57:17 +02:00
Krzesimir Nowak
73bb00a9d0 ci-automation: Break retest cycle properly in qemu on arm64
Rerunning the test will always yield the same result in this case, so
it's pointless.
2022-08-24 13:48:35 +02:00
Krzesimir Nowak
2d226f864e ci-automation/packages.sh: Fix access to unbound variable
We were running the run_sdk_container script with passing a value of a
variable named version to the script through the -v flag. But nowhere
is the variable defined. This worked under jenkins, because jenkins
job has a version parameter that gets exported into environment under
the same name. But running it manually outside jenkins revealed the
bug.

The script should have been using a vernum variable. Now, the
difference between this variable and the version variable is that
"version" was in form of <channel>-<version>-<build_id>, whereas
"vernum" comes without the channel part. Fortunately,
"run_sdk_container" was stripping the channel part before using this
value, so it makes no difference whether we pass
main-3333.0.0.0-some-id or just 3333.0.0-some-id.
2022-08-24 13:48:35 +02:00
Krzesimir Nowak
1974033edd ci-automation: Sync used EquinixMetal region to use for ARM64 servers
Recently we changed the region from DA (Dallas) to DC (Washington),
because there are more ARM64 servers available. Reflect this change in
the new pipeline too.
2022-08-05 11:14:36 +02:00
Krzesimir Nowak
661a4067a1 ci-automation/vendor-testing/azure.sh: Use an array for extra instance types 2022-08-03 16:23:15 +02:00
Krzesimir Nowak
23a05949c1 ci-automation/vendor-testing/azure.sh: Use proper machine size on arm64 2022-08-03 16:22:38 +02:00
Krzesimir Nowak
4d09ab35d6 ci-automation/vendor-testing/azure.sh: Fix unbound variable use
This gets triggered when the test is rerun and an existing image is
reused.
2022-08-03 15:21:00 +02:00
Krzesimir Nowak
7f5282e259 ci-automation/vendor-testing/azure.sh: Fix hyperv generation argument
The "v" must be a capital letter. It seems that Azure got picker about
parameters it accepts.
2022-08-03 15:21:00 +02:00
Kai Lueke
5e0dc0a85d ci-automation: Move git tagging into own script
When the build system runs the packages jobs for both architectures in
parallel and has to create a new tag, tagging fails due to the race in
the tagging.
Move the git tagging to its own script that is run from a new top-level
job that starts the packages jobs for both architectures.
2022-07-18 19:20:44 +02:00
Krzesimir Nowak
a96a66d222
Merge pull request #376 from flatcar-linux/krnowak/digests
ci-automation: Generate digests for artifacts
2022-07-14 14:42:49 +02:00
Kai Lüke
f83ee4f9a1
Merge pull request #375 from flatcar-linux/kai/print-changelog
ci-automation: Show changes by finding the previous channel
2022-07-14 13:44:22 +02:00
Kai Lueke
da370b54c1 ci-automation: Show changes by finding the previous channel
The image comparison was done against the old release in the channel
we release to instead of the previous release with the same major
version. This means when a channel transition happens we see a large
diff instead of the diff against the previous release. While not bad
for finding problems, this is normally not needed. However, we want
to have two changelogs generated, one against the old release in the
channel we relese to and one against the previous release with the same
major version when a transition happens. There was no changelog
printing yet, and this is added now.
2022-07-13 19:11:50 +02:00
Kai Lüke
76b47a00b2
Merge pull request #374 from gabriel-samfira/make-workflow-pluggable
Make the kola test workflow reusable
2022-07-13 18:09:43 +02:00