* feat(pad): compactHistory() + compactPad CLI for DB-size reclaim
Fixes#6194. Long-lived pads with heavy edit history dominate the DB —
the issue describes a ~400 MB Postgres after two months with ~100
users. Etherpad keeps every revision forever, and removing arbitrary
middle revisions is unsafe because state is reconstructed by composing
forward from key revisions.
What's safe: collapse the full history into a single base revision
that reproduces the current atext. The existing `copyPadWithoutHistory`
already does this for a new pad ID — this PR lifts that same changeset
pattern into an in-place operation and wires up an admin CLI.
- `Pad.compactHistory(authorId?)` (src/node/db/Pad.ts): composes the
current atext into one base changeset, deletes all existing rev
records, clears saved-revision bookmarks, and appends the new rev 0.
Text, attributes, and chat history are preserved; saved-revision
pointers are cleared. Returns the number of revisions removed.
- `API.compactPad(padID, authorId?)` (src/node/db/API.ts): public-API
wrapper around compactHistory. Reports `{removed}` so callers can
log savings.
- `APIHandler.ts`: register `compactPad` under a new `1.3.1` version,
bump `latestApiVersion`.
- `bin/compactPad.ts`: admin CLI. Reports the current revision count,
calls compactPad via the HTTP API, and prints how many revisions
were dropped.
- `src/tests/backend/specs/compactPad.ts`: four backend tests cover
the empty-pad no-op, the text-preservation + head=0 contract,
saved-revision cleanup, and that subsequent edits continue to
append cleanly on top of the collapsed base.
The operation is destructive so admins must opt in explicitly; the CLI
prints the before-count, and the recommended pre-flight is an
`.etherpad` export (backup).
Closes#6194
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(compact): delegate to copyPadWithoutHistory via temp-pad swap
The initial compactHistory() implementation built a custom base
changeset and re-ran appendRevision against a reset atext — but the
changeset was packed with oldLength=2 (matching copyPadWithoutHistory's
dest-pad init state) while the reset atext was only length 1, so
applyToText tripped its "mismatched apply: 1 / 2" assertion and every
test failed with a Changeset corruption error.
Switch to the tested path instead: copy the pad via
copyPadWithoutHistory to a uniquely-named temp pad (inherits all its
attribute/pool/changeset correctness), read the temp pad's rev records
back, delete the old ones under our pad's ID, write the new records in
their place, update in-memory state to match, and remove the temp pad.
Errors at any step fall through with a best-effort temp-pad cleanup.
Contract shifts slightly: the collapsed pad is head<=1 rather than
head=0, matching the shape of a freshly-imported pad (seed rev 0 +
content rev 1). Tests updated to assert that invariant plus
text-preservation, saved-revision cleanup, and append-after-compact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(6194): match the head<=1 post-compact contract
Tests previously asserted head=0 exactly after compaction; the
temp-pad-swap path lands at head=1 (one seed rev plus one content
rev) matching the shape of a freshly-imported pad. Relax the
assertions to and derive the removed-count from
before-head minus after-head, so the tests still catch regressions in
text-preservation, saved-revision cleanup, and append-after-compact
without being tied to the exact implementation shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(6194): wrap existing Cleanup instead of duplicating it
Develop already ships a working revision-cleanup path under
`src/node/utils/Cleanup.ts` with two public helpers —
`deleteAllRevisions(padId)` (collapse full history via
copyPadWithoutHistory) and `deleteRevisions(padId, keepRevisions)`
(keep the last N). The admin-settings UI wires these up but neither
is exposed on the public API, and there's no CLI for operators who
want to run compaction outside the web UI. That's the gap this PR
now fills.
Changes from the prior revision of this PR:
- Drop `pad.compactHistory()` — it re-implemented what
`Cleanup.deleteAllRevisions` already does. Remove the duplicate.
- `API.compactPad(padID, keepRevisions?)` now delegates to Cleanup:
• keepRevisions null/undefined → deleteAllRevisions (full collapse)
• keepRevisions >= 0 → deleteRevisions(N) (keep last N)
Returns {ok, mode: 'all' | 'keepLast', keepRevisions?}.
- APIHandler `1.3.1`: signature updated to take `keepRevisions`
instead of `authorId`.
- `bin/compactPad.ts`: accepts `--keep N` for the keep-last mode,
shows before/after revision counts so operators see concrete
savings.
- Backend tests rewritten around the public API surface (mode
reporting, text preservation, input validation) rather than
internal method plumbing that no longer exists.
Net: strictly a thin public-API and CLI veneer over already-tested
Cleanup helpers. No new low-level logic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(6194): assert content markers, not byte-exact atext
Cleanup.deleteAllRevisions internally calls copyPadWithoutHistory
twice (src → tempId, tempId → src with force=true), and each round
trip normalizes trailing whitespace. That meant my byte-exact
atext.text assertion failed in CI:
expected: '...line 3\n\n\n'
actual: '...line 3\n'
Swap the comparisons to use content markers (marker-alpha / beta /
gamma, keep-line-N). The test still catches the real regressions —
if compactPad lost content those markers would disappear — without
coupling to whitespace quirks of the existing Cleanup implementation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(6194): correct API param + document compactPad in http_api docs
The 1.3.1 entry in APIHandler registered `['padID', 'authorId']`, but
`API.compactPad` takes `(padID, keepRevisions)` and the CLI sends a
`keepRevisions` query param. APIHandler.handle dispatches by URL field
name, so the previous wiring silently dropped `keepRevisions` and never
ran the keep-last branch over HTTP.
- Register `['padID', 'keepRevisions']` so the handler forwards the
CLI/HTTP arg into the API function.
- Add HTTP-level dispatch tests that hit `/api/1.3.1/compactPad` with
and without `keepRevisions`. The direct `api.compactPad()` tests
bypass the handler and would have missed this regression.
- Document compactPad in `doc/api/http_api.md` and `http_api.adoc`,
and bump the documented latest version from 1.3.0 to 1.3.1 to match
`latestApiVersion`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(6194): add bin/compactAllPads for per-instance bulk compaction
`bin/compactPad <padID>` covers the case where you know which pad is
fat. For "reclaim space across the whole instance," composing
`listAllPads` + `compactPad` yourself is annoying; this script does it.
- Walks every pad on the instance and compacts it (full collapse, or
`--keep N` keep-last).
- Per-pad failures don't abort the run — they're logged, counted, and
the script exits 1 if any failed.
- `--dry-run` lists pads + revision counts without writing anything,
so operators can scope impact before committing.
- Reports `before → after` per pad and a total reclaimed count.
Deliberately not adding a `compactAllPads` HTTP API: bulk compaction
over a single HTTP request means one giant response and a long-held
connection. Operators who want this should run it locally, where they
can see progress and kill it cleanly. Staleness gating ("only pads
older than X days") is tracked separately as a follow-up.
Also registers `compactPad` and `compactAllPads` script aliases in
`bin/package.json` so they show up next to the other admin CLIs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(6194): cover the bin/compactAllPads loop logic
Previous commit added the script but only exercised it by hand. The
loop itself — error tolerance, dry-run gating, keep-last passthrough,
the empty-instance and listAllPads-failure paths — had no automated
coverage.
- Refactor compactAllPads.ts to export `runCompactAll(api, opts, logger)`
and `parseArgs(argv)`. The CLI shell wires them up to axios+APIKEY
for production; tests use an in-memory `CompactAllApi` so we don't
need to stand up the apikey-auth path in mocha.
- Add 9 specs covering: arg parsing, full-collapse iteration,
--keep N passthrough, --dry-run skipping writes, single-pad failure
not aborting the run, pre-flight count failure tolerated, a
listAllPads failure short-circuiting cleanly, the empty-instance
no-op, and a final end-to-end test that runs `runCompactAll`
against the real `/api/1.3.1/compactPad` handler over supertest+JWT
to catch contract drift between the CompactAllApi shape and the
HTTP endpoints.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(6194): address Qodo review — gate, integer check, SSL
Three valid concerns from the Qodo review on 75a08a13:
1. **cleanup.enabled gate.** The admin/Cleanup-socket path checks
`settings.cleanup.enabled` before doing anything destructive; the
public API was bypassing that gate. Now `compactPad` mirrors the
admin path's check and returns a clear apierror when disabled, so
exposing the API doesn't accidentally widen the cleanup-opt-in
surface.
2. **Number.isFinite → Number.isInteger.** `2.5` was finite and
non-negative, so the old check let it through into
`Cleanup.deleteRevisions`, which does revision-index arithmetic
that assumes integer math. Reject at the API boundary instead of
silently misbehaving.
3. **SSL-aware baseURL in the bin scripts.** Other bin scripts
hardcode `http://`, but the rest of the codebase uses
`settings.ssl ? 'https' : 'http'`. The compact CLIs now do the
same, so they work against HTTPS deployments. (Other bin scripts
carry the same bug but fixing them is out of scope for this PR.)
Tests:
- New spec: `rejects fractional keepRevisions` (2.5 with the old
check passed; the new one rejects).
- New spec: `refuses to run when cleanup.enabled is false`. The
existing API tests opt in via a before-hook + restore, so they
still cover the success path under the new gate.
- API docs (`http_api.md` + `http_api.adoc`) document the gate and
the new error message.
Skipped Qodo concerns:
- "Wrong compactPad parameters" — already fixed in 26e12ff7
(the param map now correctly says `keepRevisions`, not `authorId`).
- "Unbounded revision deletions" / "No session eviction" / changeset
base-length / padCreate hook — these all targeted the earlier
on-Pad implementation that was refactored away. The current code
wraps `Cleanup.deleteAllRevisions` / `deleteRevisions`, which
already handle concurrency, locking, and hook semantics.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: PR1 GDPR deletion-controls design spec
First of five GDPR PRs tracked in #6701. PR1 covers deletion controls:
one-time deletion token, allowPadDeletionByAllUsers flag, authorisation
matrix for handlePadDelete and the REST deletePad endpoint, a single
token-display modal for browser pad creators, and test coverage.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs: PR1 GDPR deletion-controls implementation plan
13 TDD-structured tasks covering PadDeletionManager unit tests, socket
+ REST three-way auth, clientVars wiring, one-time token modal,
delete-with-token UI, Playwright coverage, and PR handoff.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(gdpr): scaffolding for pad deletion tokens
PadDeletionManager stores a sha256-hashed per-pad deletion token and
verifies it with timing-safe comparison. createPad / createGroupPad
return the plaintext token once on first creation, and Pad.remove()
cleans it up. Gated behind the new allowPadDeletionByAllUsers flag
which defaults to false to preserve existing behaviour.
Part of #6701 (GDPR PR1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix+test(gdpr): lazy DB access in PadDeletionManager + unit tests
Capturing DB.db at module-load time was null until DB.init() ran, which
broke importing the module outside a live server (including from the
test runner). Switch to DB.db.* at call time and add unit tests
exercising create/verify/remove plus timing-safe comparison.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(gdpr): three-way auth for socket PAD_DELETE
Creator cookie → valid deletion token → allowPadDeletionByAllUsers flag.
Anyone else still gets the existing refusal shout.
* feat(gdpr): optional deletionToken on programmatic deletePad
* feat(gdpr): advertise optional deletionToken on REST deletePad
* test(gdpr): cover deletePad authorisation matrix via REST
* feat(gdpr): surface padDeletionToken in clientVars for creators only
Revision-0 author on their first CLIENT_READY visit receives the
plaintext token; all subsequent CLIENT_READYs receive null because
createDeletionTokenIfAbsent is idempotent. Readonly sessions and any
other user never see the token.
* i18n(gdpr): strings for deletion-token modal and delete-with-token flow
* feat(gdpr): token modal + delete-with-token disclosure markup
* feat(gdpr): show deletion token once, allow delete via recovery token
* style(gdpr): modal + delete-with-token layout
* test(gdpr): Playwright coverage for deletion-token modal + delete-with-token
* fix(test): auto-dismiss deletion-token modal in goToNewPad helper
The token modal introduced in PR1 blocks clicks for every Playwright
test that creates a new pad via the shared helper. Add a one-line
dismissal so unrelated tests keep passing, and have the deletion-token
spec navigate inline via newPadKeepingModal() when it needs the modal
open to capture the token.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(test): dismiss deletion-token modal without focus transfer
Clicking the ack button transferred focus out of the pad iframe, which
made subsequent keyboard-driven tests (Tab / Enter) silently miss the
editor. Swap the click for a page.evaluate() that hides the modal and
nulls clientVars.padDeletionToken directly, leaving focus where it was.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(gdpr): PadDeletionManager race + document createPad/deletePad
Qodo review:
- createDeletionTokenIfAbsent() was a non-atomic read-then-write. Two
concurrent callers for the same pad could both return different
plaintext tokens while only the later hash was stored, leaving the
first caller with an unusable recovery token. Serialise per-pad via a
Promise chain and add a regression test that fires 8 concurrent
calls and asserts exactly one plaintext is emitted and validates.
- doc/api/http_api.md now documents createPad returning deletionToken
and deletePad accepting the optional deletionToken parameter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(gdpr): always render delete-with-token in settings popup
The rebase onto develop placed the delete-pad-with-token details inside
the pad-settings-section conditional, which is only rendered when
enablePadWideSettings is true AND the section is toggled visible.
Second-device recovery (typing the captured token on a fresh browser)
must work without pad-wide settings enabled, so move the details out
to sit alongside the existing pad_deletion_token.spec.ts expectations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(gdpr): require valid token when supplied, gate on auth, harden a11y/i18n
- PadMessageHandler: a supplied deletion token must validate; do not fall
back to the creator-cookie path when the token is wrong (was deleting
the pad anyway when the creator pasted a wrong token into the field).
- Skip token issuance + UI when requireAuthentication is on (creator
identity is stable, recovery token is redundant noise).
- Server emits messageKey instead of hardcoded English; both shout
handlers (inline alert and global gritter) localize via html10n.
- Suppress the global "Admin message" gritter for pad.deletionToken.*
shouts to avoid the "Admin message: undefined" duplicate.
- Token-modal a11y: role=dialog, aria-modal, aria-labelledby/describedby,
visually-hidden label on the token input, aria-live on Copy, focus to
the token input on open and restore on dismiss.
- Style the "Delete Pad with Token" disclosure to match the Delete pad
button; align the Copy/value row; pad the disclosure label.
Tests: Playwright now covers the creator-with-wrong-token path, asserts
no "Admin message" / "undefined" gritter on denial; backend API test
covers requireAuthentication suppressing the token.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This will be a breaking change for some people.
We removed all internal password control logic. If this affects you, you have two options:
1. Use a plugin for authentication and use session based pad access (recommended).
1. Use a plugin for password setting.
The reasoning for removing this feature is to reduce the overall security footprint of Etherpad. It is unnecessary and cumbersome to keep this feature and with the thousands of available authentication methods available in the world our focus should be on supporting those and allowing more granual access based on their implementations (instead of half assed baking our own).
New feature to copy a pad without copying entire history. This is useful to perform a low CPU intensive operation while still copying current pad state.