mirror of
https://github.com/ether/etherpad-lite.git
synced 2026-05-05 20:26:49 +02:00
* feat(pad): compactHistory() + compactPad CLI for DB-size reclaim Fixes #6194. Long-lived pads with heavy edit history dominate the DB — the issue describes a ~400 MB Postgres after two months with ~100 users. Etherpad keeps every revision forever, and removing arbitrary middle revisions is unsafe because state is reconstructed by composing forward from key revisions. What's safe: collapse the full history into a single base revision that reproduces the current atext. The existing `copyPadWithoutHistory` already does this for a new pad ID — this PR lifts that same changeset pattern into an in-place operation and wires up an admin CLI. - `Pad.compactHistory(authorId?)` (src/node/db/Pad.ts): composes the current atext into one base changeset, deletes all existing rev records, clears saved-revision bookmarks, and appends the new rev 0. Text, attributes, and chat history are preserved; saved-revision pointers are cleared. Returns the number of revisions removed. - `API.compactPad(padID, authorId?)` (src/node/db/API.ts): public-API wrapper around compactHistory. Reports `{removed}` so callers can log savings. - `APIHandler.ts`: register `compactPad` under a new `1.3.1` version, bump `latestApiVersion`. - `bin/compactPad.ts`: admin CLI. Reports the current revision count, calls compactPad via the HTTP API, and prints how many revisions were dropped. - `src/tests/backend/specs/compactPad.ts`: four backend tests cover the empty-pad no-op, the text-preservation + head=0 contract, saved-revision cleanup, and that subsequent edits continue to append cleanly on top of the collapsed base. The operation is destructive so admins must opt in explicitly; the CLI prints the before-count, and the recommended pre-flight is an `.etherpad` export (backup). Closes #6194 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(compact): delegate to copyPadWithoutHistory via temp-pad swap The initial compactHistory() implementation built a custom base changeset and re-ran appendRevision against a reset atext — but the changeset was packed with oldLength=2 (matching copyPadWithoutHistory's dest-pad init state) while the reset atext was only length 1, so applyToText tripped its "mismatched apply: 1 / 2" assertion and every test failed with a Changeset corruption error. Switch to the tested path instead: copy the pad via copyPadWithoutHistory to a uniquely-named temp pad (inherits all its attribute/pool/changeset correctness), read the temp pad's rev records back, delete the old ones under our pad's ID, write the new records in their place, update in-memory state to match, and remove the temp pad. Errors at any step fall through with a best-effort temp-pad cleanup. Contract shifts slightly: the collapsed pad is head<=1 rather than head=0, matching the shape of a freshly-imported pad (seed rev 0 + content rev 1). Tests updated to assert that invariant plus text-preservation, saved-revision cleanup, and append-after-compact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(6194): match the head<=1 post-compact contract Tests previously asserted head=0 exactly after compaction; the temp-pad-swap path lands at head=1 (one seed rev plus one content rev) matching the shape of a freshly-imported pad. Relax the assertions to and derive the removed-count from before-head minus after-head, so the tests still catch regressions in text-preservation, saved-revision cleanup, and append-after-compact without being tied to the exact implementation shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(6194): wrap existing Cleanup instead of duplicating it Develop already ships a working revision-cleanup path under `src/node/utils/Cleanup.ts` with two public helpers — `deleteAllRevisions(padId)` (collapse full history via copyPadWithoutHistory) and `deleteRevisions(padId, keepRevisions)` (keep the last N). The admin-settings UI wires these up but neither is exposed on the public API, and there's no CLI for operators who want to run compaction outside the web UI. That's the gap this PR now fills. Changes from the prior revision of this PR: - Drop `pad.compactHistory()` — it re-implemented what `Cleanup.deleteAllRevisions` already does. Remove the duplicate. - `API.compactPad(padID, keepRevisions?)` now delegates to Cleanup: • keepRevisions null/undefined → deleteAllRevisions (full collapse) • keepRevisions >= 0 → deleteRevisions(N) (keep last N) Returns {ok, mode: 'all' | 'keepLast', keepRevisions?}. - APIHandler `1.3.1`: signature updated to take `keepRevisions` instead of `authorId`. - `bin/compactPad.ts`: accepts `--keep N` for the keep-last mode, shows before/after revision counts so operators see concrete savings. - Backend tests rewritten around the public API surface (mode reporting, text preservation, input validation) rather than internal method plumbing that no longer exists. Net: strictly a thin public-API and CLI veneer over already-tested Cleanup helpers. No new low-level logic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(6194): assert content markers, not byte-exact atext Cleanup.deleteAllRevisions internally calls copyPadWithoutHistory twice (src → tempId, tempId → src with force=true), and each round trip normalizes trailing whitespace. That meant my byte-exact atext.text assertion failed in CI: expected: '...line 3\n\n\n' actual: '...line 3\n' Swap the comparisons to use content markers (marker-alpha / beta / gamma, keep-line-N). The test still catches the real regressions — if compactPad lost content those markers would disappear — without coupling to whitespace quirks of the existing Cleanup implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(6194): correct API param + document compactPad in http_api docs The 1.3.1 entry in APIHandler registered `['padID', 'authorId']`, but `API.compactPad` takes `(padID, keepRevisions)` and the CLI sends a `keepRevisions` query param. APIHandler.handle dispatches by URL field name, so the previous wiring silently dropped `keepRevisions` and never ran the keep-last branch over HTTP. - Register `['padID', 'keepRevisions']` so the handler forwards the CLI/HTTP arg into the API function. - Add HTTP-level dispatch tests that hit `/api/1.3.1/compactPad` with and without `keepRevisions`. The direct `api.compactPad()` tests bypass the handler and would have missed this regression. - Document compactPad in `doc/api/http_api.md` and `http_api.adoc`, and bump the documented latest version from 1.3.0 to 1.3.1 to match `latestApiVersion`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(6194): add bin/compactAllPads for per-instance bulk compaction `bin/compactPad <padID>` covers the case where you know which pad is fat. For "reclaim space across the whole instance," composing `listAllPads` + `compactPad` yourself is annoying; this script does it. - Walks every pad on the instance and compacts it (full collapse, or `--keep N` keep-last). - Per-pad failures don't abort the run — they're logged, counted, and the script exits 1 if any failed. - `--dry-run` lists pads + revision counts without writing anything, so operators can scope impact before committing. - Reports `before → after` per pad and a total reclaimed count. Deliberately not adding a `compactAllPads` HTTP API: bulk compaction over a single HTTP request means one giant response and a long-held connection. Operators who want this should run it locally, where they can see progress and kill it cleanly. Staleness gating ("only pads older than X days") is tracked separately as a follow-up. Also registers `compactPad` and `compactAllPads` script aliases in `bin/package.json` so they show up next to the other admin CLIs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(6194): cover the bin/compactAllPads loop logic Previous commit added the script but only exercised it by hand. The loop itself — error tolerance, dry-run gating, keep-last passthrough, the empty-instance and listAllPads-failure paths — had no automated coverage. - Refactor compactAllPads.ts to export `runCompactAll(api, opts, logger)` and `parseArgs(argv)`. The CLI shell wires them up to axios+APIKEY for production; tests use an in-memory `CompactAllApi` so we don't need to stand up the apikey-auth path in mocha. - Add 9 specs covering: arg parsing, full-collapse iteration, --keep N passthrough, --dry-run skipping writes, single-pad failure not aborting the run, pre-flight count failure tolerated, a listAllPads failure short-circuiting cleanly, the empty-instance no-op, and a final end-to-end test that runs `runCompactAll` against the real `/api/1.3.1/compactPad` handler over supertest+JWT to catch contract drift between the CompactAllApi shape and the HTTP endpoints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(6194): address Qodo review — gate, integer check, SSL Three valid concerns from the Qodo review on 75a08a13: 1. **cleanup.enabled gate.** The admin/Cleanup-socket path checks `settings.cleanup.enabled` before doing anything destructive; the public API was bypassing that gate. Now `compactPad` mirrors the admin path's check and returns a clear apierror when disabled, so exposing the API doesn't accidentally widen the cleanup-opt-in surface. 2. **Number.isFinite → Number.isInteger.** `2.5` was finite and non-negative, so the old check let it through into `Cleanup.deleteRevisions`, which does revision-index arithmetic that assumes integer math. Reject at the API boundary instead of silently misbehaving. 3. **SSL-aware baseURL in the bin scripts.** Other bin scripts hardcode `http://`, but the rest of the codebase uses `settings.ssl ? 'https' : 'http'`. The compact CLIs now do the same, so they work against HTTPS deployments. (Other bin scripts carry the same bug but fixing them is out of scope for this PR.) Tests: - New spec: `rejects fractional keepRevisions` (2.5 with the old check passed; the new one rejects). - New spec: `refuses to run when cleanup.enabled is false`. The existing API tests opt in via a before-hook + restore, so they still cover the success path under the new gate. - API docs (`http_api.md` + `http_api.adoc`) document the gate and the new error message. Skipped Qodo concerns: - "Wrong compactPad parameters" — already fixed in 26e12ff7 (the param map now correctly says `keepRevisions`, not `authorId`). - "Unbounded revision deletions" / "No session eviction" / changeset base-length / padCreate hook — these all targeted the earlier on-Pad implementation that was refactored away. The current code wraps `Cleanup.deleteAllRevisions` / `deleteRevisions`, which already handle concurrency, locking, and hook semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
93 lines
3.5 KiB
TypeScript
93 lines
3.5 KiB
TypeScript
'use strict';
|
|
|
|
/*
|
|
* Compact a pad's revision history to reclaim database space.
|
|
*
|
|
* Usage:
|
|
* node bin/compactPad.js <padID> # collapse all history
|
|
* node bin/compactPad.js <padID> --keep N # keep only the last N revisions
|
|
*
|
|
* Wraps the existing Cleanup helper (src/node/utils/Cleanup.ts) via the
|
|
* compactPad HTTP API so admins can trigger it from the CLI without
|
|
* routing through the admin settings UI. Destructive — export the pad as
|
|
* `.etherpad` first for backup.
|
|
*
|
|
* Issue #6194: long-lived pads with heavy edit history accumulate hundreds
|
|
* of megabytes in the DB; this tool is the per-pad brick for reclaiming
|
|
* that space without rotating to a new pad ID.
|
|
*/
|
|
import path from 'node:path';
|
|
import fs from 'node:fs';
|
|
import process from 'node:process';
|
|
import axios from 'axios';
|
|
|
|
// As of v14, Node.js does not exit when there is an unhandled Promise rejection. Convert an
|
|
// unhandled rejection into an uncaught exception, which does cause Node.js to exit.
|
|
process.on('unhandledRejection', (err) => { throw err; });
|
|
|
|
const settings = require('ep_etherpad-lite/tests/container/loadSettings').loadSettings();
|
|
|
|
axios.defaults.baseURL =
|
|
`${settings.ssl ? 'https' : 'http'}://${settings.ip}:${settings.port}`;
|
|
|
|
const usage = () => {
|
|
console.error('Usage:');
|
|
console.error(' node bin/compactPad.js <padID>');
|
|
console.error(' node bin/compactPad.js <padID> --keep <N>');
|
|
process.exit(2);
|
|
};
|
|
|
|
const args = process.argv.slice(2);
|
|
if (args.length < 1 || args.length > 3) usage();
|
|
const padId = args[0];
|
|
|
|
let keepRevisions: number | null = null;
|
|
if (args.length === 3) {
|
|
if (args[1] !== '--keep') usage();
|
|
keepRevisions = Number(args[2]);
|
|
if (!Number.isInteger(keepRevisions) || keepRevisions < 0) {
|
|
console.error(`--keep expects a non-negative integer; got ${args[2]}`);
|
|
process.exit(2);
|
|
}
|
|
}
|
|
|
|
// get the API Key
|
|
const filePath = path.join(__dirname, '../APIKEY.txt');
|
|
const apikey = fs.readFileSync(filePath, {encoding: 'utf-8'}).trim();
|
|
|
|
(async () => {
|
|
const apiInfo = await axios.get('/api/');
|
|
const apiVersion: string | undefined = apiInfo.data.currentVersion;
|
|
if (!apiVersion) throw new Error('No version set in API');
|
|
|
|
// Pre-flight: show current revision count so operators can eyeball impact.
|
|
const countUri = `/api/${apiVersion}/getRevisionsCount?apikey=${apikey}&padID=${padId}`;
|
|
const countRes = await axios.get(countUri);
|
|
if (countRes.data.code !== 0) {
|
|
console.error(`getRevisionsCount failed: ${JSON.stringify(countRes.data)}`);
|
|
process.exit(1);
|
|
}
|
|
const before: number = countRes.data.data.revisions;
|
|
const strategy = keepRevisions == null ? 'collapse all' : `keep last ${keepRevisions}`;
|
|
console.log(`Pad ${padId}: ${before + 1} revision(s). Strategy: ${strategy}.`);
|
|
|
|
const params = new URLSearchParams({apikey, padID: padId});
|
|
if (keepRevisions != null) params.set('keepRevisions', String(keepRevisions));
|
|
const result = await axios.post(`/api/${apiVersion}/compactPad?${params.toString()}`);
|
|
if (result.data.code !== 0) {
|
|
console.error(`compactPad failed: ${JSON.stringify(result.data)}`);
|
|
process.exit(1);
|
|
}
|
|
|
|
// Post-flight: the pad is now compacted. Re-read the rev count so the
|
|
// operator sees concrete savings.
|
|
const afterRes = await axios.get(countUri);
|
|
const after: number | undefined = afterRes.data?.data?.revisions;
|
|
if (after != null) {
|
|
console.log(`Done. Pad ${padId}: ${after + 1} revision(s) remaining ` +
|
|
`(was ${before + 1}).`);
|
|
} else {
|
|
console.log('Done.');
|
|
}
|
|
})();
|