archlinux-docker/REPRO.md
Robin Candau d0a2374d67
Expand repro documentation and ensure fixed timezome
Expand the repro documentation with missing bits:

- The Dockerfile needs to be regenerated with the correct group for title annotation to ensure reproducibility.
- The CI_COMMIT_SHA of the original pipeline needs to be honored in the Dockerfile.

Also, set the timezome to UTC in Makefile and scripts to ensure consistency in the generated dates / timestamps (e.g. ARCHIVE_SNAPSHOT / SOURCE_DATE_EPOCH), regardless of the timezone of the environment. Otherwise, someone rebuilding the image locally can unexpectedly end up with a different value for those if the system uses a different timezome.
2026-04-29 17:49:11 +02:00

207 lines
8.7 KiB
Markdown

# Reproducing a `repro` image locally
The `repro` image provides a bit for bit [reproducible build](https://reproducible-builds.org)
of the `base` image.
Note that, to ensure reproducibility, the pacman keys are stripped from this
image, so you're expected to run `pacman-key --init && pacman-key --populate archlinux`
before being able to update the system and install packages via `pacman` when using this image.
To reproduce the `repro` image locally, follow the below instructions.
## Disclaimer
Reproducible builds [expect the same build environment across builds](https://reproducible-builds.org/docs/definition/).
While it *should* be fine in most cases, this means we cannot guarantee that you will always be able
to successfully reproduce a specific image locally over time.
Technically speaking, the older the image you're trying to reproduce is, the more chance there is
to have more or less significant differences between your build environment
and the one used to build the original image (for instance in terms of packages versions).
Such differences can affect the build (and the resulting artifacts). Please note that failing to
reproduce an image locally does not necessarily mean that it isn't reproducible per se, but can
just be the result of significant enough differences between your build environment and the one
used to build the original image.
You can avoid (or mitigate) eventual issues due to such differences by restoring all packages
of your build environment to the build date of the original image (see the [related instructions
from the Arch Wiki](https://wiki.archlinux.org/title/Arch_Linux_Archive#Restore_all_packages_to_a_specific_date)).
## Dependencies
Install the following Arch Linux packages:
* make
* devtools
* git
* podman
* fakechroot
* fakeroot
* diffoscope (to optionally check the reproducibility of the rootFS)
* diffoci
## Prepare the build environment
Prepare the build environment by setting the following environment variables:
* `BUILD_VERSION`: The build version of the `repro` image you want to reproduce.
For instance, if you're aiming to reproduce the `repro-20260331.0.508794` image:
```bash
export BUILD_VERSION="20260331.0.508794"
```
* `ARCHIVE_SNAPSHOT`: The date of the Arch Linux repository archive snaphot to build
the image against. This is based on the date included in the image's `BUILD_VERSION`:
```bash
export ARCHIVE_SNAPSHOT=$(date -u -d "${BUILD_VERSION%%.*} -1 day" +"%Y/%m/%d")
```
* `SOURCE_DATE_EPOCH`: The value to normalize timestamps with during the build.
This is based on the date included in the image's `BUILD_VERSION`:
```bash
export SOURCE_DATE_EPOCH=$(date -u -d "${BUILD_VERSION%%.*} 00:00:00" +"%s")
```
Then pull the original image you're aiming to reproduce and set its revision value in your environment (needed to correctly set the revision annotation in the Dockerfile):
```bash
podman pull docker.io/archlinux/archlinux:repro-$BUILD_VERSION
export CI_COMMIT_SHA=$(podman inspect --format '{{ index .Config.Labels "org.opencontainers.image.revision" }}' archlinux/archlinux:repro-$BUILD_VERSION)
```
Finally, clone the [archlinux-docker](https://gitlab.archlinux.org/archlinux/archlinux-docker)
repository and move into it:
```bash
git clone https://gitlab.archlinux.org/archlinux/archlinux-docker.git
cd archlinux-docker
```
Note that all the following instructions assume that you are at the root of the
archlinux-docker repository cloned above.
## Build the rootFS and generate the Dockerfile
Build the rootFS with the required parameters:
```bash
make \
ARCHIVE_SNAPSHOT="$ARCHIVE_SNAPSHOT" \
SOURCE_DATE_EPOCH="$SOURCE_DATE_EPOCH" \
$PWD/output/Dockerfile.repro
scripts/make-dockerfile.sh repro.tar.zst repro output/ "true" "repro" "$SOURCE_DATE_EPOCH"
```
The following resulting artifacts will be located in `$PWD/output`:
* repro.tar.zst (the rootFS)
* repro.tar.zst.SHA256 (sha256 hash of the rootFS)
* Dockerfile.repro (the generated Dockerfile)
## Optional - Check the rootFS reproducibility
At that point, if the artifacts built for the image you're aiming to reproduce
are still available for download from the rootfs stage of the corresponding
[archlinux-docker pipeline](https://gitlab.archlinux.org/archlinux/archlinux-docker/-/pipelines)
, you can optionally compare the content of the `repro.tar.zst.SHA256`
file from the pipeline to the one generated during your local build (which
should be the same, indicating that the rootFS has been successfully reproduced).
Additionally, you can check differences between the `repro.tar.zst` tarball from
the pipeline and the one built during your local build with `diffoscope`
*(where `/tmp/repro.tar.zst` is the rootFS tarball downloaded from the pipeline and
`$PWD/output/repro.tar.zst` is the rootFS tarball built during your local build in the following example)*:
```bash
diffoscope /tmp/repro.tar.zst $PWD/output/repro.tar.zst
```
This should return no difference, acting as additional indicator that the rootFS has been
successfully reproduced.
## Build the image
You can now (re)build the image against the rootFS and the Dockerfile generated in the previous step.
To do so, build the image with the required parameters:
```bash
podman build \
--no-cache \
--source-date-epoch=$SOURCE_DATE_EPOCH \
--rewrite-timestamp \
-f "$PWD/output/Dockerfile.repro" \
-t "archlinux:repro-$BUILD_VERSION" \
"$PWD/output"
```
The built image will be accessible in your local podman container storage under the name:
`localhost/archlinux:repro-$BUILD_VERSION`.
## Check the image reproducibility
Compare the digest of the original image pulled from Docker Hub to the digest of the image you built
locally:
```bash
podman inspect --format '{{.Digest}}' docker.io/archlinux/archlinux:repro-$BUILD_VERSION
podman inspect --format '{{.Digest}}' localhost/archlinux:repro-$BUILD_VERSION
```
Both digests should be identical, indicating that the image has been successfully reproduced.
Additionally, you can check difference between the image pulled from Docker Hub and
the image you built locally with `diffoci`:
```bash
diffoci diff --ignore-image-name --verbose podman://docker.io/archlinux/archlinux:repro-$BUILD_VERSION podman://localhost/archlinux:repro-$BUILD_VERSION
```
This should show no difference, acting as additional indicator that the image has been
successfully reproduced *(see the following section about the `--ignore-image-name` flag requirement)*.
### Note about the necessity of the `--ignore-image-name` flag with `diffoci`
Docker / Podman does not allow to have two images with the same name & tag combination stored
locally, [making it impossible to compare two images with the same name with
`diffoci`](https://github.com/reproducible-containers/diffoci/issues/74).
To work around this limitation, one of the two image has to be named differently, whether by
setting a different name / tag combination at build time (as done in this guide) or by renaming
it post-build with e.g. `podman tag`.
However, the image name & tag combination is automatically reported (and updated in the case
of a renaming) in the image annotations and it's not possible to fully overwrite
it during build or update it post-build in a straightforward way.
This unavoidably introduces non-deterministic data in the image name annotations
that `diffoci` will systematically report by default.
See for instance the following `diffoci` output reporting a difference in the image name annotation:
```
Event: "DescriptorMismatch" (field "Annotations")
map[string]string{
"io.containerd.image.name": strings.Join({
- "docker.io/archlinux/archlinux:repro-20260331.0.508794",
+ "localhost/archlinux:repro-20260331.0.508794",
}, ""),
```
Given that it's currently not possible to have two images with the same name & tag
combination stored locally and that it's also not possible to "normalize" the related
annotations metadata during (or after) the build, we are currently [forced to ignore those with
the `--ignore-image-name` flag](https://github.com/reproducible-containers/diffoci/issues/266)
to workaround this technical constraint.
Regardless, we can attest that:
* This limitation is specific to metadata handling in container tooling and does not
affect the actual filesystem contents or runtime behavior of the image.
* The reported difference in the image name annotation when running `diffoci` without the `--ignore-image-name` flag
is (or is supposed to be, at least) the **only** difference being reported when comparing the two images.
* The image name annotation metadata are not part of the hashed object when generating the image digest,
meaning that this difference does not prevent digest equality between the two images (allowing
us to claim bit for bit reproducibility regardless).