We override `PARALLEL_TESTS`, because kola run with PARALLEL_TESTS >= 4
causes the tests to provision >= 12 ARM servers at the same time. As the
da11 region does not have that many free ARM servers, the whole tests
will fail. With PARALLEL_TESTS=2 the total number of servers stays < 10.
In addition, we override `timeout` to 10 hours, because it takes more
than 8 hours to run all tests only with 2 tests in parallel.
Equinix Metal ARM server are not yet hourly available in the default `sv15` region
so we override the `PACKET_REGION` to `Dallas` since it's available in this region.
We do not override `PACKET_REGION` for both board on top level because we need to keep proximity
for PXE booting.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Currently the os/sdk and os/toolchains job perform a chroot update whose
results are immediately discarded because the rest of the build uses a fresh
chroot and catalyst. Towards the end of a release period this can extend the
build time by about an hour (longer if rust is involved).
Introduce a `--setuponly` flag that bails after the chroot configuration, and
the skips chroot update.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
and add script used for that purpose. This requires access to a github PAT
with 'repo.status' permissions.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Currently the kubeadm tests fail on arm64 because the instance type
only offers 1 vCPU:
cluster.go:117: error execution phase preflight: [preflight] Some fatal errors occurred:
cluster.go:117: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
Switch to the next larger instance type which has 2 vCPUS.
if the test is ran for ARM64, there is no need to run `update_chroot`
since there is no SDK.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
The SDK can either be a release SDK or a dev build SDK which are stored
in different paths. DOWNLOAD_ROOT_SDK should be based on the
SDK_URL_PATH value which indicates whether it's a release or dev build
path.
bootstrap_sdk runs catalyst.sh which will try to download the SDK if the
verify digest fails.
Importing the DIGEST allows to skip this step and to continue with the
previously downloaded SDK.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
When PORTAGE_REF or OVERLAY_REF are numbers, we can change the way the refspec
is constructed to allow fetching a PR instead instead of a branch. Checking for
equality using '[' works to detect numbers, bash's '[[' doesn't.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Otherwise, it was failing since we check for unbound variable:
```
/bin/bash: line 1: PORTAGE_BINHOST: unbound variable
```
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Otherwise, the variable is empty and it creates errors later. Default
value is `gs://flatcar-jenkins`. Not `GS_DEVEL_ROOT` because if we check
the previous behavior, `DOWNLOAD_ROOT` was hardcoded with:
```shell
DOWNLOAD_ROOT_SDK=https://storage.googleapis.com/flatcar-jenkins/sdk
```
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
`$verify_key` actually holds `--verify-key=verify.asc` so of course
`systemd-nspawn` fails since it does not expect `--verify-key` value.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
The catalyst build uses the same SDK version as seed as the current SDK, but
will only reuse the cached tarball if a DIGESTS file exists and is correct.
Prefetch this file to prevent the build from trying to access google storage
anonymously.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
because we need to pass google credentials to update_chroot, and 'cork update'
doesn't support that.
Add --sdk-url-path to sdk.sh for new cork default.
in this commit we make sure to use GCS bucket for dev container tests by
providing the required credentials and the associated fetch command.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
The cl.basic and cl.internet tests are different tests which wasn't
clear before. Also, the grep process returns an exit code of 1 if it
didn't find a match, causing the job to cancel. The list of tests is
space separated and should not be quoted but on the other hand, we
do have to handle a literal *.
Look for the right test and handle the grep exit code, and disable
globs for the subshell for preserving a literal *.
The Linux 5.10 stable kernel introduced a regression that we didn't
catch because we only run kola on one hardware type in Equinix Metal.
Validate that a simple network test works on various instance types of
the current hardware generation.
It has some weird semantics that seem to trip us up after updating
bash to 5.1. We tried to use it inside functions to clean up some
stuff after function returns. This can be emulated with an EXIT trap
within a subshell. Fortunately all the users of the RETURN trap were
not setting any global variables - modifications of such variables are
local to the subshell and are lost when the subshell exits.
Included is a dockerfile that installs system deps of kola in an debian:11
image. For the test script, the control flow is:
qemu_uefi.sh
qemu_uefi_arm64.sh
(docker)
qemu_common.sh
qemu_common uses the 'NATIVE_ARM64' variable passed by the jenkins job to control the behavior.
The differences are:
* use git directly to fetch (and verify) the manifest
* setup some symlinks so that /var/tmp is on the same BTRFS partition as $PWD/tmp
* setup symlinks so that we don't have to fixup installation of mantle to chroot
* run things directly instead of in chroot through cork
The whole script is executed as root, because kola requires root privileges
anyway and making kvm and sudo work with an arbitrary host user inside the
container would require a custom entrypoint to setup groups.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This requires passing the --azure-hyper-v-generation=V2 argument to kola. The
vhd/image is the same as for azure gen1 vms, the azure_gen2 specifier is only
for jenkins usage.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The newly enabled update test performs an update from the built image
to itself. This is useful to test that the update mechanism didn't
break but it doesn't say if the built image will be accepted as update
from the previous official release.
Introduce an additional kola run that begins from the previous official
release and tests to update to the built image. Since the test does two
updates it also covers the case of updating from the built image to the
built image. Thus, we can skip the test in the normal run.
This new kola run is done first to keep the qemu-latest symlink valid
for the main test suite.
The kola update tests need a dev-key-signed update payload. This was
lacking and caused the update tests to be skipped.
Generate the test update payload for both dev builds and release builds
and run the kola tests for both. The test update payload has a special
name to not confuse it with the real update payload for releases, and
we keep the previous behavior to sign releases. Therefore, the
generate_update function wasn't used but the extract_update function
extended with generating the additional test payload.
The logic of the inline bash scripts of each job was sometimes
separated into the flatcar-scripts/jenkins/*.sh helpers but mostly
part of the Groovy file. This coupling had its advantages but also
downsides when special cases needed to be added for different release
versions. Other issues were that the inline scripts needed the
backslash character to be escaped twice and Jenkins was not good in
terminating the child processes when stopping a job. Having inline
bash scripts in Groovy also mandated the use of Jenkins to build and
release Flatcar Container Linux which hinders test builds in other CI
platforms.
Move the inline bash scripts fully to to the files in
flatcar-scripts/jenkins/ and create new ones for job that didn't have
a script there yet. Also invoke them through a systemd-run wrapper
script which ensures that all child processes are terminated and also
sets up /opt/bin as additional path for the static lbzcat binary.
A workaround for bash 4 was needed to use a temporary file instead of
the <(cmd) bash feature which caused a strange syntax error, otherwise
the bash commands are moved as they are.
Setting the invalid CCACHE_ variables resulted in strange failure
in projects depending on meson, newer version like 0.55.3. For example
systemd build fails like the following errors:
```
* ACCESS DENIED: utimes: /mnt/host/source/ccache
* ACCESS DENIED: utimes: /mnt/host/source/ccache
F: utimes
S: deny
P: /mnt/host/source/ccache
A: /mnt/host/source/ccache
R: /mnt/host/source/ccache
C: ccache cc /build/amd64-usr/var/tmp/portage/sys-apps/systemd-246/work/systemd-246-abi_x86_64.amd64/meson-private/sanitycheckc.c -o /build/amd64-usr/var/tmp/portage/sys-apps/systemd-246/work/systemd-246-abi_x86_64.amd64/meson-private/sanitycheckc.exe -O1 -pipe -pipe -D_FILE_OFFSET_BITS=64
```
We should not set up ccache at all, as it has been already disabled in
coreos-overlay repo.