etherpad-lite/bin/compactAllPads.ts
John McLear 4bda757304
feat(api): public compactPad API + bin/compactPad CLI over existing Cleanup (#7567)
* feat(pad): compactHistory() + compactPad CLI for DB-size reclaim

Fixes #6194. Long-lived pads with heavy edit history dominate the DB —
the issue describes a ~400 MB Postgres after two months with ~100
users. Etherpad keeps every revision forever, and removing arbitrary
middle revisions is unsafe because state is reconstructed by composing
forward from key revisions.

What's safe: collapse the full history into a single base revision
that reproduces the current atext. The existing `copyPadWithoutHistory`
already does this for a new pad ID — this PR lifts that same changeset
pattern into an in-place operation and wires up an admin CLI.

- `Pad.compactHistory(authorId?)` (src/node/db/Pad.ts): composes the
  current atext into one base changeset, deletes all existing rev
  records, clears saved-revision bookmarks, and appends the new rev 0.
  Text, attributes, and chat history are preserved; saved-revision
  pointers are cleared. Returns the number of revisions removed.
- `API.compactPad(padID, authorId?)` (src/node/db/API.ts): public-API
  wrapper around compactHistory. Reports `{removed}` so callers can
  log savings.
- `APIHandler.ts`: register `compactPad` under a new `1.3.1` version,
  bump `latestApiVersion`.
- `bin/compactPad.ts`: admin CLI. Reports the current revision count,
  calls compactPad via the HTTP API, and prints how many revisions
  were dropped.
- `src/tests/backend/specs/compactPad.ts`: four backend tests cover
  the empty-pad no-op, the text-preservation + head=0 contract,
  saved-revision cleanup, and that subsequent edits continue to
  append cleanly on top of the collapsed base.

The operation is destructive so admins must opt in explicitly; the CLI
prints the before-count, and the recommended pre-flight is an
`.etherpad` export (backup).

Closes #6194

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(compact): delegate to copyPadWithoutHistory via temp-pad swap

The initial compactHistory() implementation built a custom base
changeset and re-ran appendRevision against a reset atext — but the
changeset was packed with oldLength=2 (matching copyPadWithoutHistory's
dest-pad init state) while the reset atext was only length 1, so
applyToText tripped its "mismatched apply: 1 / 2" assertion and every
test failed with a Changeset corruption error.

Switch to the tested path instead: copy the pad via
copyPadWithoutHistory to a uniquely-named temp pad (inherits all its
attribute/pool/changeset correctness), read the temp pad's rev records
back, delete the old ones under our pad's ID, write the new records in
their place, update in-memory state to match, and remove the temp pad.
Errors at any step fall through with a best-effort temp-pad cleanup.

Contract shifts slightly: the collapsed pad is head<=1 rather than
head=0, matching the shape of a freshly-imported pad (seed rev 0 +
content rev 1). Tests updated to assert that invariant plus
text-preservation, saved-revision cleanup, and append-after-compact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(6194): match the head<=1 post-compact contract

Tests previously asserted head=0 exactly after compaction; the
temp-pad-swap path lands at head=1 (one seed rev plus one content
rev) matching the shape of a freshly-imported pad. Relax the
assertions to  and derive the removed-count from
before-head minus after-head, so the tests still catch regressions in
text-preservation, saved-revision cleanup, and append-after-compact
without being tied to the exact implementation shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(6194): wrap existing Cleanup instead of duplicating it

Develop already ships a working revision-cleanup path under
`src/node/utils/Cleanup.ts` with two public helpers —
`deleteAllRevisions(padId)` (collapse full history via
copyPadWithoutHistory) and `deleteRevisions(padId, keepRevisions)`
(keep the last N). The admin-settings UI wires these up but neither
is exposed on the public API, and there's no CLI for operators who
want to run compaction outside the web UI. That's the gap this PR
now fills.

Changes from the prior revision of this PR:

- Drop `pad.compactHistory()` — it re-implemented what
  `Cleanup.deleteAllRevisions` already does. Remove the duplicate.
- `API.compactPad(padID, keepRevisions?)` now delegates to Cleanup:
    • keepRevisions null/undefined → deleteAllRevisions (full collapse)
    • keepRevisions >= 0          → deleteRevisions(N)  (keep last N)
  Returns {ok, mode: 'all' | 'keepLast', keepRevisions?}.
- APIHandler `1.3.1`: signature updated to take `keepRevisions`
  instead of `authorId`.
- `bin/compactPad.ts`: accepts `--keep N` for the keep-last mode,
  shows before/after revision counts so operators see concrete
  savings.
- Backend tests rewritten around the public API surface (mode
  reporting, text preservation, input validation) rather than
  internal method plumbing that no longer exists.

Net: strictly a thin public-API and CLI veneer over already-tested
Cleanup helpers. No new low-level logic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(6194): assert content markers, not byte-exact atext

Cleanup.deleteAllRevisions internally calls copyPadWithoutHistory
twice (src → tempId, tempId → src with force=true), and each round
trip normalizes trailing whitespace. That meant my byte-exact
atext.text assertion failed in CI:
  expected: '...line 3\n\n\n'
  actual:   '...line 3\n'

Swap the comparisons to use content markers (marker-alpha / beta /
gamma, keep-line-N). The test still catches the real regressions —
if compactPad lost content those markers would disappear — without
coupling to whitespace quirks of the existing Cleanup implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(6194): correct API param + document compactPad in http_api docs

The 1.3.1 entry in APIHandler registered `['padID', 'authorId']`, but
`API.compactPad` takes `(padID, keepRevisions)` and the CLI sends a
`keepRevisions` query param. APIHandler.handle dispatches by URL field
name, so the previous wiring silently dropped `keepRevisions` and never
ran the keep-last branch over HTTP.

- Register `['padID', 'keepRevisions']` so the handler forwards the
  CLI/HTTP arg into the API function.
- Add HTTP-level dispatch tests that hit `/api/1.3.1/compactPad` with
  and without `keepRevisions`. The direct `api.compactPad()` tests
  bypass the handler and would have missed this regression.
- Document compactPad in `doc/api/http_api.md` and `http_api.adoc`,
  and bump the documented latest version from 1.3.0 to 1.3.1 to match
  `latestApiVersion`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(6194): add bin/compactAllPads for per-instance bulk compaction

`bin/compactPad <padID>` covers the case where you know which pad is
fat. For "reclaim space across the whole instance," composing
`listAllPads` + `compactPad` yourself is annoying; this script does it.

- Walks every pad on the instance and compacts it (full collapse, or
  `--keep N` keep-last).
- Per-pad failures don't abort the run — they're logged, counted, and
  the script exits 1 if any failed.
- `--dry-run` lists pads + revision counts without writing anything,
  so operators can scope impact before committing.
- Reports `before → after` per pad and a total reclaimed count.

Deliberately not adding a `compactAllPads` HTTP API: bulk compaction
over a single HTTP request means one giant response and a long-held
connection. Operators who want this should run it locally, where they
can see progress and kill it cleanly. Staleness gating ("only pads
older than X days") is tracked separately as a follow-up.

Also registers `compactPad` and `compactAllPads` script aliases in
`bin/package.json` so they show up next to the other admin CLIs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(6194): cover the bin/compactAllPads loop logic

Previous commit added the script but only exercised it by hand. The
loop itself — error tolerance, dry-run gating, keep-last passthrough,
the empty-instance and listAllPads-failure paths — had no automated
coverage.

- Refactor compactAllPads.ts to export `runCompactAll(api, opts, logger)`
  and `parseArgs(argv)`. The CLI shell wires them up to axios+APIKEY
  for production; tests use an in-memory `CompactAllApi` so we don't
  need to stand up the apikey-auth path in mocha.
- Add 9 specs covering: arg parsing, full-collapse iteration,
  --keep N passthrough, --dry-run skipping writes, single-pad failure
  not aborting the run, pre-flight count failure tolerated, a
  listAllPads failure short-circuiting cleanly, the empty-instance
  no-op, and a final end-to-end test that runs `runCompactAll`
  against the real `/api/1.3.1/compactPad` handler over supertest+JWT
  to catch contract drift between the CompactAllApi shape and the
  HTTP endpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(6194): address Qodo review — gate, integer check, SSL

Three valid concerns from the Qodo review on 75a08a13:

1. **cleanup.enabled gate.** The admin/Cleanup-socket path checks
   `settings.cleanup.enabled` before doing anything destructive; the
   public API was bypassing that gate. Now `compactPad` mirrors the
   admin path's check and returns a clear apierror when disabled, so
   exposing the API doesn't accidentally widen the cleanup-opt-in
   surface.

2. **Number.isFinite → Number.isInteger.** `2.5` was finite and
   non-negative, so the old check let it through into
   `Cleanup.deleteRevisions`, which does revision-index arithmetic
   that assumes integer math. Reject at the API boundary instead of
   silently misbehaving.

3. **SSL-aware baseURL in the bin scripts.** Other bin scripts
   hardcode `http://`, but the rest of the codebase uses
   `settings.ssl ? 'https' : 'http'`. The compact CLIs now do the
   same, so they work against HTTPS deployments. (Other bin scripts
   carry the same bug but fixing them is out of scope for this PR.)

Tests:
- New spec: `rejects fractional keepRevisions` (2.5 with the old
  check passed; the new one rejects).
- New spec: `refuses to run when cleanup.enabled is false`. The
  existing API tests opt in via a before-hook + restore, so they
  still cover the success path under the new gate.
- API docs (`http_api.md` + `http_api.adoc`) document the gate and
  the new error message.

Skipped Qodo concerns:
- "Wrong compactPad parameters" — already fixed in 26e12ff7
  (the param map now correctly says `keepRevisions`, not `authorId`).
- "Unbounded revision deletions" / "No session eviction" / changeset
  base-length / padCreate hook — these all targeted the earlier
  on-Pad implementation that was refactored away. The current code
  wraps `Cleanup.deleteAllRevisions` / `deleteRevisions`, which
  already handle concurrency, locking, and hook semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 14:23:54 +01:00

228 lines
7.7 KiB
TypeScript

'use strict';
/*
* Compact every pad on the instance to reclaim database space.
*
* Usage:
* node bin/compactAllPads.js # collapse all history on every pad
* node bin/compactAllPads.js --keep N # keep last N revisions per pad
* node bin/compactAllPads.js --dry-run # list pads + rev counts, no writes
*
* Composes the existing `listAllPads` and `compactPad` HTTP APIs — there is
* deliberately no instance-wide HTTP endpoint, because doing this over a
* single request would mean one giant response and a long-held connection.
* Per-pad failures don't stop the run; they're logged and counted, and the
* exit code reflects whether anything failed.
*
* Destructive — `getEtherpad`-export anything you can't afford to lose
* before running.
*
* Issue #6194: per-instance bulk compaction. The per-pad `bin/compactPad`
* is the right tool when you know which pad is fat; this is the right tool
* when you want to reclaim space across the whole instance.
*/
import path from 'node:path';
import fs from 'node:fs';
import process from 'node:process';
import axios from 'axios';
export type CompactAllOpts = {
keepRevisions: number | null;
dryRun: boolean;
};
// Minimal interface mirroring the API endpoints the script needs. Tests
// substitute their own implementation that goes through supertest+JWT
// instead of axios+APIKEY, so the loop logic is exercised against a real
// running server without dragging in apikey-file or axios setup.
export type CompactAllApi = {
listAllPads(): Promise<string[]>;
getRevisionsCount(padId: string): Promise<number>;
compactPad(padId: string, keepRevisions: number | null): Promise<void>;
};
export type CompactAllReport = {
total: number;
ok: number;
failed: number;
totalRevsBefore: number;
totalRevsAfter: number;
};
export type CompactAllLogger = {
info(msg: string): void;
error(msg: string): void;
};
const defaultLogger: CompactAllLogger = {
info: (m) => console.log(m),
error: (m) => console.error(m),
};
// Pure-ish core: composition + per-pad error tolerance + dry-run + tally.
// Returns a structured report so tests can assert on outcomes; the CLI
// shell maps it to an exit code.
export const runCompactAll = async (
api: CompactAllApi, opts: CompactAllOpts,
logger: CompactAllLogger = defaultLogger,
): Promise<CompactAllReport> => {
let padIds: string[];
try {
padIds = await api.listAllPads();
} catch (e: any) {
logger.error(`listAllPads failed: ${e.message ?? e}`);
return {total: 0, ok: 0, failed: 1, totalRevsBefore: 0, totalRevsAfter: 0};
}
if (padIds.length === 0) {
logger.info('No pads on this instance.');
return {total: 0, ok: 0, failed: 0, totalRevsBefore: 0, totalRevsAfter: 0};
}
const strategy = opts.keepRevisions == null
? 'collapse all history'
: `keep last ${opts.keepRevisions} revisions`;
logger.info(`Found ${padIds.length} pad(s). Strategy: ${strategy}` +
`${opts.dryRun ? ' (dry run — no writes)' : ''}.`);
const report: CompactAllReport = {
total: padIds.length, ok: 0, failed: 0,
totalRevsBefore: 0, totalRevsAfter: 0,
};
for (let i = 0; i < padIds.length; i++) {
const padId = padIds[i];
const idx = `[${i + 1}/${padIds.length}]`;
let before: number;
try {
before = await api.getRevisionsCount(padId);
} catch (e: any) {
logger.error(`${idx} ${padId}: getRevisionsCount failed: ${e.message ?? e}`);
report.failed++;
continue;
}
if (opts.dryRun) {
logger.info(`${idx} ${padId}: ${before + 1} revision(s) — would compact`);
report.totalRevsBefore += before + 1;
continue;
}
try {
await api.compactPad(padId, opts.keepRevisions);
} catch (e: any) {
logger.error(`${idx} ${padId}: compactPad failed: ${e.message ?? e}`);
report.failed++;
continue;
}
let after: number | undefined;
try { after = await api.getRevisionsCount(padId); }
catch { /* main op already succeeded; post-count is informational */ }
if (after != null) {
logger.info(`${idx} ${padId}: ${before + 1}${after + 1} revision(s)`);
report.totalRevsBefore += before + 1;
report.totalRevsAfter += after + 1;
} else {
logger.info(`${idx} ${padId}: compacted (post-count unavailable)`);
}
report.ok++;
}
if (opts.dryRun) {
logger.info('');
logger.info(`Dry run complete. ${padIds.length} pad(s), ` +
`${report.totalRevsBefore} total revision(s) — re-run ` +
'without --dry-run to compact.');
} else {
logger.info('');
logger.info(`Done. ${report.ok} pad(s) compacted, ${report.failed} failed. ` +
`Revisions: ${report.totalRevsBefore}${report.totalRevsAfter} ` +
`(reclaimed ${report.totalRevsBefore - report.totalRevsAfter}).`);
}
return report;
};
export const parseArgs = (argv: string[]): CompactAllOpts | null => {
const opts: CompactAllOpts = {keepRevisions: null, dryRun: false};
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === '--dry-run') {
opts.dryRun = true;
} else if (a === '--keep') {
const v = argv[++i];
const n = Number(v);
if (!Number.isInteger(n) || n < 0) {
console.error(`--keep expects a non-negative integer; got ${v}`);
return null;
}
opts.keepRevisions = n;
} else {
return null;
}
}
return opts;
};
// CLI entry point. Skipped when this file is imported (e.g. by tests),
// so the test harness can use `runCompactAll` directly without network.
const usage = () => {
console.error('Usage:');
console.error(' node bin/compactAllPads.js');
console.error(' node bin/compactAllPads.js --keep <N>');
console.error(' node bin/compactAllPads.js --dry-run');
process.exit(2);
};
const isMain = require.main === module;
if (isMain) {
process.on('unhandledRejection', (err) => { throw err; });
const settings = require('ep_etherpad-lite/tests/container/loadSettings').loadSettings();
axios.defaults.baseURL =
`${settings.ssl ? 'https' : 'http'}://${settings.ip}:${settings.port}`;
const opts = parseArgs(process.argv.slice(2));
if (!opts) usage();
const apikey = fs.readFileSync(
path.join(__dirname, '../APIKEY.txt'), {encoding: 'utf-8'}).trim();
// Bind the abstract API to axios + APIKEY auth for the CLI shell.
const cliApi: CompactAllApi = {
async listAllPads() {
const apiInfo = await axios.get('/api/');
const apiVersion: string | undefined = apiInfo.data.currentVersion;
if (!apiVersion) throw new Error('No version set in API');
// Stash on this for subsequent calls. Avoids a per-call /api/ ping.
(cliApi as any)._apiVersion = apiVersion;
const r = await axios.get(`/api/${apiVersion}/listAllPads?apikey=${apikey}`);
if (r.data.code !== 0) throw new Error(JSON.stringify(r.data));
return r.data.data.padIDs ?? [];
},
async getRevisionsCount(padId: string) {
const v = (cliApi as any)._apiVersion;
const r = await axios.get(
`/api/${v}/getRevisionsCount?apikey=${apikey}` +
`&padID=${encodeURIComponent(padId)}`);
if (r.data.code !== 0) throw new Error(JSON.stringify(r.data));
return r.data.data.revisions;
},
async compactPad(padId: string, keepRevisions: number | null) {
const v = (cliApi as any)._apiVersion;
const params = new URLSearchParams({apikey, padID: padId});
if (keepRevisions != null) params.set('keepRevisions', String(keepRevisions));
const r = await axios.post(`/api/${v}/compactPad?${params.toString()}`);
if (r.data.code !== 0) throw new Error(JSON.stringify(r.data));
},
};
(async () => {
const report = await runCompactAll(cliApi, opts!);
if (report.failed > 0) process.exit(1);
})();
}