The kola test run time shouldn't be longer than the GC duration to
prevent failing tests caused by GC interference.
Align the Azure kola timeout with the GC duration.
With the limit of 2 parallel tests, meaning 6 machines, the test time
is ~10 hours which is longer than the GC time. It seems that the
regional capacity is not so limited at the moment and we can try to
increase the number of machines.
Adjust the timeout to reflect the GC time and increase the parallel
tests to 3, meaning 9 machines.
GCP Pro is failing because hostname is > 63 char:
```
Apr 5 19:52:27.522820 kubelet[1762]: E0405 19:52:27.522513 1762 kubelet_node_status.go:93] "Unable to register node with API server" err="Node \"jenkins-gce-pro-5-91a967ef5450cb932bc5.c.flatcar-212911.internal\" is invalid: metadata.labels: Invalid value: \"jenkins-gce-pro-5-91a967ef5450cb932bc5.c.flatcar-212911.internal\": must be no more than 63 characters" node="jenkins-gce-pro-5-91a967ef5450cb932bc5.c.flatcar-212911.internal"
```
Let's remove `jenkins` and `gce` from the hostname, these
information are not critical for debugging purposes.
Hostname should now looks like
"basic-5-91a967ef5450cb932bc5.c.flatcar-212911.internal" or
"pro-5-91a967ef5450cb932bc5.c.flatcar-212911.internal"
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
There was a regression for an instance type which could have been
prevented by testing.
Add the same extended test logic used for Equinix Metal to AWS.
The default branch of both repos, coreos-overlay and portage-stable,
should be `main`. If we checkout `master` branch, which contains
invalid source code that was deprecated many years ago, the build could
sometimes fail, e.g. when trying to build perl 5.26.2 with gcc 10.
Simply delete the code checking out branches, as the part is already
being handled in emerge-gitclone.
number of test increased. While we don't have yet a way to reduce
testing time, let's increase the timeout.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
We override `PARALLEL_TESTS`, because kola run with PARALLEL_TESTS >= 4
causes the tests to provision >= 12 ARM servers at the same time. As the
da11 region does not have that many free ARM servers, the whole tests
will fail. With PARALLEL_TESTS=2 the total number of servers stays < 10.
In addition, we override `timeout` to 10 hours, because it takes more
than 8 hours to run all tests only with 2 tests in parallel.
Equinix Metal ARM server are not yet hourly available in the default `sv15` region
so we override the `PACKET_REGION` to `Dallas` since it's available in this region.
We do not override `PACKET_REGION` for both board on top level because we need to keep proximity
for PXE booting.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
Currently the kubeadm tests fail on arm64 because the instance type
only offers 1 vCPU:
cluster.go:117: error execution phase preflight: [preflight] Some fatal errors occurred:
cluster.go:117: [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
Switch to the next larger instance type which has 2 vCPUS.
if the test is ran for ARM64, there is no need to run `update_chroot`
since there is no SDK.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
The SDK can either be a release SDK or a dev build SDK which are stored
in different paths. DOWNLOAD_ROOT_SDK should be based on the
SDK_URL_PATH value which indicates whether it's a release or dev build
path.
Otherwise, it was failing since we check for unbound variable:
```
/bin/bash: line 1: PORTAGE_BINHOST: unbound variable
```
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
`$verify_key` actually holds `--verify-key=verify.asc` so of course
`systemd-nspawn` fails since it does not expect `--verify-key` value.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
because we need to pass google credentials to update_chroot, and 'cork update'
doesn't support that.
Add --sdk-url-path to sdk.sh for new cork default.
in this commit we make sure to use GCS bucket for dev container tests by
providing the required credentials and the associated fetch command.
Signed-off-by: Mathieu Tortuyaux <mtortuyaux@microsoft.com>
The cl.basic and cl.internet tests are different tests which wasn't
clear before. Also, the grep process returns an exit code of 1 if it
didn't find a match, causing the job to cancel. The list of tests is
space separated and should not be quoted but on the other hand, we
do have to handle a literal *.
Look for the right test and handle the grep exit code, and disable
globs for the subshell for preserving a literal *.
The Linux 5.10 stable kernel introduced a regression that we didn't
catch because we only run kola on one hardware type in Equinix Metal.
Validate that a simple network test works on various instance types of
the current hardware generation.
Included is a dockerfile that installs system deps of kola in an debian:11
image. For the test script, the control flow is:
qemu_uefi.sh
qemu_uefi_arm64.sh
(docker)
qemu_common.sh
qemu_common uses the 'NATIVE_ARM64' variable passed by the jenkins job to control the behavior.
The differences are:
* use git directly to fetch (and verify) the manifest
* setup some symlinks so that /var/tmp is on the same BTRFS partition as $PWD/tmp
* setup symlinks so that we don't have to fixup installation of mantle to chroot
* run things directly instead of in chroot through cork
The whole script is executed as root, because kola requires root privileges
anyway and making kvm and sudo work with an arbitrary host user inside the
container would require a custom entrypoint to setup groups.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This requires passing the --azure-hyper-v-generation=V2 argument to kola. The
vhd/image is the same as for azure gen1 vms, the azure_gen2 specifier is only
for jenkins usage.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The newly enabled update test performs an update from the built image
to itself. This is useful to test that the update mechanism didn't
break but it doesn't say if the built image will be accepted as update
from the previous official release.
Introduce an additional kola run that begins from the previous official
release and tests to update to the built image. Since the test does two
updates it also covers the case of updating from the built image to the
built image. Thus, we can skip the test in the normal run.
This new kola run is done first to keep the qemu-latest symlink valid
for the main test suite.
The kola update tests need a dev-key-signed update payload. This was
lacking and caused the update tests to be skipped.
Generate the test update payload for both dev builds and release builds
and run the kola tests for both. The test update payload has a special
name to not confuse it with the real update payload for releases, and
we keep the previous behavior to sign releases. Therefore, the
generate_update function wasn't used but the extract_update function
extended with generating the additional test payload.
The logic of the inline bash scripts of each job was sometimes
separated into the flatcar-scripts/jenkins/*.sh helpers but mostly
part of the Groovy file. This coupling had its advantages but also
downsides when special cases needed to be added for different release
versions. Other issues were that the inline scripts needed the
backslash character to be escaped twice and Jenkins was not good in
terminating the child processes when stopping a job. Having inline
bash scripts in Groovy also mandated the use of Jenkins to build and
release Flatcar Container Linux which hinders test builds in other CI
platforms.
Move the inline bash scripts fully to to the files in
flatcar-scripts/jenkins/ and create new ones for job that didn't have
a script there yet. Also invoke them through a systemd-run wrapper
script which ensures that all child processes are terminated and also
sets up /opt/bin as additional path for the static lbzcat binary.
A workaround for bash 4 was needed to use a temporary file instead of
the <(cmd) bash feature which caused a strange syntax error, otherwise
the bash commands are moved as they are.