etherpad-lite/packaging/test-local.sh
John McLear 0b40bfc784
feat(packaging): add Debian (.deb) build via nfpm with systemd unit (v2) (#7583)
* feat(packaging): add Debian (.deb) build via nfpm with systemd unit

First-class Debian packaging for Etherpad, producing
etherpad_<version>_<arch>.deb artefacts for amd64 and arm64 from a
single nfpm manifest. Installing the package gives users:

- /opt/etherpad with a prebuilt, self-contained node_modules/ — no
  pnpm required at runtime, just `nodejs (>= 20)`.
- etherpad system user/group, created via `adduser` in preinst.
- /etc/etherpad/settings.json seeded from the template on first
  install, preserved across upgrades, removed on `purge`. Seed rewrites
  dbType from the template's dev-only `dirty` default to `sqlite`,
  pointed at /var/lib/etherpad/etherpad.db so fresh installs get an
  ACID-safe DB without manual config. sqlite is shipped by ueberdb2
  (rusty-store-kv), so no additional apt deps are needed.
- /var/lib/etherpad owned by etherpad:etherpad, writable under the
  hardened unit's ProtectSystem=strict.
- /lib/systemd/system/etherpad.service — hardened unit
  (NoNewPrivileges, ProtectSystem=strict, ProtectHome, PrivateTmp,
  RestrictAddressFamilies) with Restart=on-failure.
- /usr/bin/etherpad CLI wrapper running `node --import tsx/esm`.

CI (.github/workflows/deb-package.yml) triggers on v* tags, builds both
arches via native runners (ubuntu-latest + ubuntu-24.04-arm),
smoke-tests the amd64 package end-to-end (install → verify sqlite
default → systemctl start → curl /health → purge → confirm user
removed), and attaches the artefacts to the GitHub Release.

Re-introduces the work from #7559 (reverted in #7582) with two
corrections:

1. Package name and all installed paths use `etherpad`, not
   `etherpad-lite` — matches the repo rename. Kept replaces/conflicts
   on `etherpad-lite` so any dev builds of the reverted PR upgrade
   cleanly.
2. Default dbType is `sqlite`, not `dirty`. The template's own comment
   says dirty is for testing only; shipping it by default to everyone
   who runs `apt install etherpad` is the wrong tradeoff for a
   production package.

Publishing to an APT repo (Cloudsmith, Launchpad PPA, self-hosted
reprepro) is intentionally out of scope — needs a governance decision
on who holds the signing key. Recipes are documented in
packaging/README.md.

Refs #7529, #7559, #7582

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging): address PR review — startup crashes, supply chain, Node LTS

Addresses Qodo and SamTV12345 review feedback on #7583:

- postinstall: symlink /opt/etherpad/var → /var/lib/etherpad/var so
  ProtectSystem=strict doesn't block runtime writes (var/js,
  installed_plugins.json, etc.). Existing ReadWritePaths covers it.
- postinstall: seed installed_plugins.json with ep_etherpad-lite so
  checkForMigration() does not spawn `pnpm ls` on first boot — pnpm is
  not a runtime dep, and the bundled node_modules already contains
  every shipped plugin. Prevents network plugin installs at first run.
- postremove: clean up the new var symlink on remove.
- workflow: verify nfpm .deb sha256 against upstream checksums.txt
  before sudo dpkg -i (defense in depth).
- workflow: bump Node 22 → 24 (current LTS, per SamTV12345). The deb
  Depends stays at nodejs (>= 20) to match Etherpad's engines.node.
- workflow: smoke-test now asserts the var symlink and seeded
  installed_plugins.json exist post-install.
- workflow: publish stable etherpad-latest_{amd64,arm64}.deb aliases
  alongside the versioned files in the GitHub Release.
- README: bump Node guidance to 24, document /releases/latest URL,
  link to engines.node floor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging): tsx CJS hook, plugin paths writable, glob tag triggers

Addresses second-round Qodo review on #7583:

- bin/etherpad: switch from `--import tsx/.../esm` to `--require
  tsx/cjs`. server.ts uses `exports.start = ...` which throws under
  the ESM loader; the prod script in src/package.json uses tsx/cjs
  for the same reason.
- postinstall: symlink /opt/etherpad/src/plugin_packages →
  /var/lib/etherpad/plugin_packages and chgrp /opt/etherpad/src/node_modules
  to etherpad with mode 2775. Otherwise admin-UI plugin install
  EACCESes — those are the dirs LinkInstaller writes to.
- systemd unit: add /opt/etherpad/src/node_modules to ReadWritePaths
  so symlink creation by the etherpad user is allowed under
  ProtectSystem=strict. plugin_packages is already covered via the
  symlink into /var/lib/etherpad.
- postremove: clean up the new plugin_packages symlink on remove.
- workflow: tag filters were `v[0-9]+.[0-9]+.[0-9]+`, but Actions tag
  filters are globs, not regex. `[0-9]+` matches one character, so
  multi-digit tags like v2.10.0 would never trigger. Switch to
  `v*.*.*` / `v*.*.*-*`, matching handleRelease.yml.
- workflow smoke test now asserts plugin_packages symlink target,
  ownership of plugin_packages and node_modules.
- test-local.sh: new script that builds the .deb and runs the same
  smoke test in a throwaway systemd-enabled Docker container, so
  failures are caught before pushing.
- README: document test-local.sh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(packaging): test-local.sh — fix cgroups v2, add --no-systemd mode

- systemd-in-docker on cgroups v2 needs --cgroupns=host and a writable
  /sys/fs/cgroup mount; the previous :ro version booted to nothing.
- New --no-systemd mode: drops the systemd container in favour of plain
  ubuntu:24.04 + manual launch under the etherpad user. Validates the
  postinstall, wrapper, plugin paths, and /health without depending on
  the host's systemd-in-docker setup. Use it when --privileged systemd
  containers don't boot on your kernel/docker combo.
- On systemd container exit the script now dumps the last 50 log lines
  and points at --no-systemd as the fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(packaging): test-local.sh — reuse cached image in --no-systemd

If ubuntu:24.04 isn't on disk and the registry is unreachable, fall
back to whichever ubuntu/debian image is already cached (e.g. the
jrei/systemd-ubuntu image we pulled for the systemd path). Avoids a
registry round-trip on flaky networks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: handle spawn errors in run_cmd; deb-package install order + offline-safe test

src/node/utils/run_cmd.ts:
  Without `proc.on('error', ...)` a spawn failure (e.g. ENOENT for a
  missing binary) is emitted as an unlistened 'error' event, which
  Node treats as an uncaught exception that bypasses the awaiting
  try/catch and kills the process. The .deb hits this on first boot
  because plugins.ts spawns `pnpm --version` for a startup log line
  and pnpm isn't a runtime dep — Etherpad logs "Starting" then
  immediately stops. Reject the promise on 'error' so the existing
  try/catch in the caller actually catches it.

packaging/scripts/postinstall.sh:
  chown /var/lib/etherpad/plugin_packages AFTER `cp -a` from the
  staged tree — `cp -a` preserves source (root) ownership and was
  re-rooting the directory we'd just chowned to etherpad. Same
  ordering the var symlink block already used.

packaging/test-local.sh:
  Run `CI=1 pnpm install --frozen-lockfile` before staging so the
  package is built from a fresh, lockfile-consistent tree (matches
  CI). Fixes spurious "Cannot find module 'X'" failures from stale
  local symlinks pointing at out-of-date pnpm store paths.

End-to-end test now passes: postinstall asserts pass, /health
returns 200, dpkg --purge cleans up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: gitignore packaging build artefacts; drop accidental commit

Drop packaging/etc/settings.json.dist that snuck into the previous
commit (generated at build time by test-local.sh / CI from
settings.json.template). Add /staging/, /dist/, /packaging/etc/ to
.gitignore so they don't recur.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(plugins): downgrade missing-pnpm log from ERROR to debug

The startup IIFE that logs the pnpm version is informational only.
pnpm is a dev-only dependency: admin-UI plugin install goes through
live-plugin-manager directly, and plugin migration is short-circuited
when var/installed_plugins.json is present (e.g. on packaged
installs). A missing pnpm on PATH is therefore expected on hardened
deployments and shouldn't surface as a red ERROR in journalctl.

Detect ENOENT specifically and log at debug; treat other errors
(permission denied, etc.) as warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(packaging): smoke deb on PRs + backend test for run_cmd spawn errors

CI gap: deb-package.yml only fired on v* tag pushes, so a PR that
broke the .deb wasn't caught until release time. Wire it to PRs and
develop pushes via a paths filter covering packaging files and the
runtime files Etherpad needs at first boot. The release job already
gates on `if: startsWith(github.ref, 'refs/tags/v')` so PR runs
won't try to publish.

Test gap: the run_cmd.ts spawn-error fix (commit 5eee7895a) had no
test, which is how the bug shipped originally — plugins.ts spawned
`pnpm --version` at startup, the rejection was never caught, and
the .deb crashed mid-boot. Add a backend spec that exercises:
  - ENOENT for a missing binary -> rejects (regression test)
  - successful command -> resolves stdout
  - non-zero exit -> rejects with code

backend-tests.yml's recursive mocha glob picks up the new spec
automatically; no workflow change needed there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging-ci): use NodeSource LTS for the smoke test (was Ubuntu's node 18)

ubuntu-latest's default apt nodejs is 18.19.1, but our package requires
nodejs (>= 20). The smoke test was doing `apt-get install nodejs`
followed by `dpkg -i ... || apt-get install -f`, which on a node-18
host fails the dep check, then `-f` "fixes" by REMOVING the etherpad
package — and the next assertion (test -x /usr/bin/etherpad) crashes.

Match what packaging/test-local.sh and the README recommend: install
node from NodeSource (current LTS) before installing the .deb.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging-ci): sudo-prefix smoke assertions that read /etc/etherpad

postinstall sets /etc/etherpad to 0750 root:etherpad (DB creds live
here) and /var/lib/etherpad similarly. The GH Actions runner user
isn't in the etherpad group, so 'test -f /etc/etherpad/settings.json'
hits EACCES. Add sudo to each check that crosses one of those dirs.

(Wrapping the whole block in `sudo bash <<EOF` would have been
cleaner but YAML literal-block + heredoc terminator don't play well
together at this indent.)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging): close chown -R symlink-deref escalation; Pre-Depends adduser

postinstall:
  Use `chown -hR` instead of `chown -R` on /var/lib/etherpad/var and
  /var/lib/etherpad/plugin_packages. Both directories are writable by
  the unprivileged etherpad service user, so a symlink planted there
  could redirect root's chown onto arbitrary system files (e.g.
  /etc/shadow) on the next `apt upgrade`. -hR makes chown act on the
  symlink itself rather than its target — standard mitigation for this
  TOCTOU-style local privilege escalation.

nfpm:
  Move adduser from Depends to Pre-Depends. preinst creates the
  etherpad user before unpacking; with plain `dpkg -i` (no apt) the
  Depends list isn't installed beforehand, so a minimal system without
  adduser would fail preinst before unpack and apt-get -f couldn't
  recover. Pre-Depends guarantees adduser is configured first.

Both flagged in Qodo's persistent review of 3daf300f0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging): predepends lives at top-level deb:, not under overrides

nfpm's Overridables schema doesn't include predepends; it's a deb-only
top-level field. Previous commit nested it under overrides.deb, which
caused nfpm to reject the entire manifest with "field predepends not
found in type nfpm.Overridables" and broke both arch builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging): four Qodo follow-ups (CI ordering, secure node install, disable on remove, writable settings)

deb-package.yml:
  - Move 'Resolve version' (which calls `node -p`) to AFTER setup-node
    so it doesn't depend on the runner image preinstalling node.
  - Replace `curl ... | sudo bash` NodeSource installer with the
    explicit gpg-key + sources.list approach. Same outcome (NodeSource
    LTS apt repo), but no execution of network-fetched code as root.
    Reduces blast radius if NodeSource's setup endpoint is ever
    compromised — we only trust the signed apt repo metadata.

postinstall.sh:
  - /etc/etherpad/settings.json now etherpad:etherpad mode 0660 (was
    root:etherpad 0640). The admin /admin/settings UI persists changes
    by writing back to settings.settingsFilename; with the previous
    perms the etherpad user could read but not write, so saving via
    the admin UI failed silently. Group-only access preserved (DB
    creds still unreadable by other users).

postremove.sh:
  - On `dpkg --remove`, run `systemctl disable etherpad.service` before
    `daemon-reload` so the wants/ symlink doesn't dangle after dpkg
    deletes the unit file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(packaging): narrow workflow token scope; pin local nfpm to NFPM_VERSION

deb-package.yml:
  Workflow-level permissions was `contents: write` so the build job got
  write access on every PR run, even though only the release job needs
  it (to attach release assets). Narrow the workflow default to
  `contents: read` and let the release job opt back in to write — it
  already declares its own job-level `contents: write` block, so this
  is just removing an over-broad default.

test-local.sh:
  The script defined NFPM_VERSION but then unconditionally ran
  `goreleaser/nfpm:latest`, so local builds could diverge from CI's
  pinned v2.43.0. Use the variable in the docker tag (stripping the
  leading "v" to match the image's tag scheme).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 10:33:30 +01:00

204 lines
7.2 KiB
Bash
Executable File

#!/usr/bin/env bash
# Build the .deb locally and run it through the same smoke test as CI,
# in a throwaway systemd-enabled Docker container. Mirrors the steps in
# .github/workflows/deb-package.yml so failures here predict CI failures.
#
# Usage: packaging/test-local.sh # build + smoke test
# packaging/test-local.sh --shell # leave a shell open after smoke test
# packaging/test-local.sh --build-only
#
# Requirements: docker, node, pnpm. nfpm is fetched into the container.
set -euo pipefail
REPO_ROOT="$(git rev-parse --show-toplevel)"
cd "${REPO_ROOT}"
ARCH="${ARCH:-amd64}"
NFPM_VERSION="${NFPM_VERSION:-v2.43.0}"
SYSTEMD_IMAGE="${SYSTEMD_IMAGE:-jrei/systemd-ubuntu:24.04}"
CONTAINER_NAME="${CONTAINER_NAME:-etherpad-deb-test}"
MODE=smoke
NO_SYSTEMD=
for arg in "$@"; do
case "$arg" in
--shell) MODE=shell ;;
--build-only) MODE=build ;;
--no-systemd) NO_SYSTEMD=1 ;;
*) echo "unknown arg: $arg" >&2; exit 2 ;;
esac
done
echo "==> Refreshing dependencies (matches CI)"
# CI=1 makes pnpm non-interactive (so it doesn't prompt on a clean reinstall).
CI=1 pnpm install --frozen-lockfile
echo "==> Building staging tree"
rm -rf staging dist packaging/etc
mkdir -p staging/opt/etherpad packaging/etc dist
cp -a src bin package.json pnpm-workspace.yaml README.md LICENSE node_modules \
staging/opt/etherpad/
printf 'packages:\n - src\n - bin\n' > staging/opt/etherpad/pnpm-workspace.yaml
cp settings.json.template packaging/etc/settings.json.dist
echo "==> Building .deb via nfpm ${NFPM_VERSION} (in container)"
VERSION="$(node -p 'require("./package.json").version')"
# Pin to NFPM_VERSION so local builds match what CI produces. The
# goreleaser/nfpm tag drops the leading "v".
docker run --rm \
-v "${REPO_ROOT}":/w -w /w \
-e VERSION="${VERSION}" -e ARCH="${ARCH}" \
"goreleaser/nfpm:${NFPM_VERSION#v}" \
package --packager deb -f packaging/nfpm.yaml --target dist/
DEB_FILE="$(ls dist/etherpad_*_${ARCH}.deb | head -1)"
echo "==> Built: ${DEB_FILE}"
dpkg-deb -I "${DEB_FILE}" | sed 's/^/ /'
if [ "${MODE}" = "build" ]; then
exit 0
fi
docker rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true
trap '[ "${MODE}" = shell ] || docker rm -f "${CONTAINER_NAME}" >/dev/null 2>&1 || true' EXIT
if [ -z "${NO_SYSTEMD}" ]; then
echo "==> Launching systemd container (${SYSTEMD_IMAGE})"
# systemd-in-docker on cgroups v2 needs: --privileged, --cgroupns=host,
# rw mount of /sys/fs/cgroup, and tmpfs for /run + /run/lock.
if ! docker run -d --name "${CONTAINER_NAME}" \
--privileged --cgroupns=host \
--tmpfs /tmp --tmpfs /run --tmpfs /run/lock \
-v /sys/fs/cgroup:/sys/fs/cgroup:rw \
-v "${REPO_ROOT}/dist":/dist:ro \
-p 9001:9001 \
"${SYSTEMD_IMAGE}" >/dev/null; then
echo "!! docker run failed; rerun with --no-systemd to skip the systemd path."
exit 1
fi
echo "==> Waiting for systemd in container to be ready"
ready=
for i in $(seq 1 30); do
state="$(docker inspect -f '{{.State.Status}}' "${CONTAINER_NAME}" 2>/dev/null || echo missing)"
if [ "${state}" != "running" ]; then
echo "!! container exited (state=${state}). Last logs:"
docker logs "${CONTAINER_NAME}" 2>&1 | tail -50 || true
echo
echo "!! Tip: rerun with --no-systemd to skip the systemd-in-Docker"
echo " step and validate everything else (postinstall, wrapper,"
echo " plugin paths, /health under a manual launch)."
exit 1
fi
if docker exec "${CONTAINER_NAME}" systemctl list-units --type=target >/dev/null 2>&1; then
ready=1; break
fi
sleep 1
done
[ -n "${ready}" ] || { echo "!! systemd never came up"; docker logs "${CONTAINER_NAME}" 2>&1 | tail -50; exit 1; }
else
# Reuse whichever ubuntu-ish image is already on disk to avoid a
# registry round-trip (handy on flaky networks).
PLAIN_IMAGE="${PLAIN_IMAGE:-}"
if [ -z "${PLAIN_IMAGE}" ]; then
for candidate in ubuntu:24.04 "${SYSTEMD_IMAGE}" ubuntu:latest debian:stable; do
if docker image inspect "${candidate}" >/dev/null 2>&1; then
PLAIN_IMAGE="${candidate}"
break
fi
done
: "${PLAIN_IMAGE:=ubuntu:24.04}"
fi
echo "==> Launching plain container (--no-systemd, image=${PLAIN_IMAGE})"
docker run -d --name "${CONTAINER_NAME}" \
--entrypoint /bin/sh \
--tmpfs /tmp --tmpfs /run \
-v "${REPO_ROOT}/dist":/dist:ro \
-p 9001:9001 \
"${PLAIN_IMAGE}" -c 'sleep infinity' >/dev/null
fi
echo "==> Installing nodejs + the .deb inside the container"
docker exec "${CONTAINER_NAME}" bash -lc '
set -euo pipefail
export DEBIAN_FRONTEND=noninteractive
apt-get update -qq
apt-get install -y -qq curl ca-certificates gnupg
curl -fsSL https://deb.nodesource.com/setup_lts.x | bash - >/dev/null
apt-get install -y -qq nodejs
dpkg -i /dist/etherpad_*_'"${ARCH}"'.deb || apt-get install -f -y -qq
'
echo "==> Asserting postinstall results"
docker exec "${CONTAINER_NAME}" bash -lc '
set -eux
test -x /usr/bin/etherpad
test -f /etc/etherpad/settings.json
test -L /opt/etherpad/settings.json
test -L /opt/etherpad/var
[ "$(readlink /opt/etherpad/var)" = "/var/lib/etherpad/var" ]
test -L /opt/etherpad/src/plugin_packages
[ "$(readlink /opt/etherpad/src/plugin_packages)" = "/var/lib/etherpad/plugin_packages" ]
test -d /var/lib/etherpad/plugin_packages
[ "$(stat -c %U /var/lib/etherpad/plugin_packages)" = "etherpad" ]
[ "$(stat -c %G /opt/etherpad/src/node_modules)" = "etherpad" ]
test -f /var/lib/etherpad/var/installed_plugins.json
grep -q "ep_etherpad-lite" /var/lib/etherpad/var/installed_plugins.json
grep -q "\"dbType\": \"sqlite\"" /etc/etherpad/settings.json
id etherpad
'
if [ -z "${NO_SYSTEMD}" ]; then
echo "==> Starting etherpad.service"
docker exec "${CONTAINER_NAME}" systemctl start etherpad
else
echo "==> Starting etherpad manually (no systemd in container)"
docker exec -d "${CONTAINER_NAME}" runuser -u etherpad -- \
bash -c 'cd /opt/etherpad && NODE_ENV=production /usr/bin/etherpad >/tmp/etherpad.log 2>&1'
fi
echo "==> Waiting for /health"
ok=
for i in $(seq 1 30); do
if docker exec "${CONTAINER_NAME}" curl -fsS http://127.0.0.1:9001/health >/dev/null 2>&1; then
ok=1; break
fi
sleep 2
done
if [ -z "${ok}" ]; then
echo "!! /health never responded — dumping logs:"
if [ -z "${NO_SYSTEMD}" ]; then
docker exec "${CONTAINER_NAME}" journalctl -u etherpad --no-pager -n 200 || true
else
docker exec "${CONTAINER_NAME}" tail -n 200 /tmp/etherpad.log || true
fi
exit 1
fi
echo "==> /health OK"
docker exec "${CONTAINER_NAME}" curl -fsS http://127.0.0.1:9001/health
echo
if [ "${MODE}" = "shell" ]; then
echo
echo "Container left running as '${CONTAINER_NAME}'. Useful commands:"
echo " docker exec -it ${CONTAINER_NAME} bash"
echo " docker exec ${CONTAINER_NAME} journalctl -u etherpad -f"
echo " curl http://127.0.0.1:9001/"
echo "Stop with: docker rm -f ${CONTAINER_NAME}"
exit 0
fi
echo "==> Purging the package"
if [ -z "${NO_SYSTEMD}" ]; then
docker exec "${CONTAINER_NAME}" systemctl stop etherpad
else
docker exec "${CONTAINER_NAME}" pkill -f 'node.*server.ts' || true
fi
docker exec "${CONTAINER_NAME}" dpkg --purge etherpad
docker exec "${CONTAINER_NAME}" bash -c '! id etherpad 2>/dev/null'
echo "==> All checks passed."