Now URLs for torcx packages are always present in the torcx manifest,
but for releases they may be pointing to the origin server where the
packages will be eventually uploaded. At the time of running the
tests, those packages are still only in the build cache, so change the
URLs to point to the build cache, so the test can pass.
Torcx manifest may contain paths and URLs as locations of
packages. There are two kinds of packages - vendored and
extra. Vendored packages normally have two locations - path to the
directory inside the image where the package is (which is why it's
called vendored), and a URL to the package on some remote
server. Extra packages only have a URL. But the URLs are added only
when we tell the build_torcx_store script to upload the packages at
the same time, which is what the old build pipeline was doing. With
the new pipeline, the upload happens as a separate step, thus the
upload is disabled when invoking build_torcx_store, and so the
packages are not getting URLs set. This change went unnoticed, because
a kola test checking the generated torcx manifest was only checking if
there is at least one location, either path or URL, and all the new
releases have no extra packages, only vendored ones.
When backporting the new pipeline to old LTS, the kola tests started
to fail, because old LTS had one extra package, and this is how I
noticed the problem.
The old pipeline had a release job where mantle's plume release tool
was invoked to publish the cloud images.
Implement a release job in the new pipeline with the same goals and
eventually even more automation.
To review the image changes and the changelog more easily and in case
of fixes, iterate over it without rebuilding the image, move this logic
to its own file where a new job could call it.
When started by the Flatcar core user, the SDK failed to use UID 500
because inside the SDK there already is the core user from nss-altfiles
with the same ID. This way, the SDK user was continuing with UID 1000
and had permission errors.
Allow to reuse an existing ID for the SDK user. However, this only
works when usermod doesn't find a process that uses this ID, and we had
a race between the SDK entry points called by "docker start" and by
"docker exec". The race is unwanted anyway because we don't want to
execute the commands while setup_board is still running. Solve it by
setting the entrypoint for "docker start" directly to "bash -l" in
"docker create" (this is also what the entry point does as last step:
sudo su -l).
The SDK container has a copy of sdk_entry.sh for standalone use. This
was also used by run_sdk_container which required creating new SDK
container images for changes to take effect.
Use the repository's version from run_sdk_container for fixes to take
effect without requiring new SDK containers.
Instead of printing failed tests like this:
Failed tests: kubeadm.v1.25.0.cilium.base
kubeadm.v1.24.1.cilium.base
Do it like this:
Failed tests:
kubeadm.v1.25.0.cilium.base
kubeadm.v1.24.1.cilium.base
We have set success to true when the test cycle was broken, which was
a hacky way to avoid printing the give up message. But this setting
success to true also meant that the script returned with status 0,
which is wrong.
Add another variable for controlling printing the give up message.
The new m3.small instance does not have official Flatcar support yet
but we can already cover it in our PXE boot release tests.
The c3.small instances are legacy and m3.small is the new smallest
type.
We were running the run_sdk_container script with passing a value of a
variable named version to the script through the -v flag. But nowhere
is the variable defined. This worked under jenkins, because jenkins
job has a version parameter that gets exported into environment under
the same name. But running it manually outside jenkins revealed the
bug.
The script should have been using a vernum variable. Now, the
difference between this variable and the version variable is that
"version" was in form of <channel>-<version>-<build_id>, whereas
"vernum" comes without the channel part. Fortunately,
"run_sdk_container" was stripping the channel part before using this
value, so it makes no difference whether we pass
main-3333.0.0.0-some-id or just 3333.0.0-some-id.