Skip to content

Refetch & validate

Refetch

A capture has three independently-attemptable dimensions. khiipd refetch lets you re-run any one of them:

Terminal window
khiipd refetch 01JX9... # re-extract: runs the extractor again and creates
# a new capture that supersedes the old one
khiipd refetch 01JX9... --media # re-walk the media-fetcher registry in place
khiipd refetch 01JX9... --wayback # re-submit to the Wayback Machine in place

Three dimensions

CommandDimensionEffect
khiipd refetch <id>extraction (default)Re-runs the extractor against the original URL and writes a new capture; the old one is marked superseded (below).
khiipd refetch <id> --mediamediaRe-walks the media-fetcher registry on the existing capture in place — same id, same vault file. For retrying a media download that previously failed.
khiipd refetch <id> --waybackwaybackRe-submits the canonical URL to the Wayback Machine in place, updating the capture’s archive_urls.

--media and --wayback are mutually exclusive. All three map to POST /api/v1/captures/{id}/refetch?dimension={extraction|media|wayback}, and the dimensions are independently re-attemptable per ADR-0010 — a failed media fetch doesn’t force you to re-run the whole extraction.

The supersession chain

Re-extraction is append-only. Rather than overwriting the old capture, Khiip:

  1. Runs a fresh capture (bypassing dedup) through the full pipeline — extraction, media-fetch, Wayback, Source-tier, vault write, SQLite, embeddings.
  2. Sets a superseded_by pointer on the old capture’s row, pointing at the new id.

The old capture and its vault file stay on disk, so the bitemporal history is preserved — you can still answer “what did this say when I first captured it.” A later dedup (or a re-capture of the same URL) resolves to the most-recent un-superseded capture — the one whose superseded_by is empty. The CLI prints both ids on success:

✓ re-extracted; new capture id: 01JXA…
old (superseded): 01JX9…

The in-place dimensions (--media / --wayback) don’t create a new capture or touch the supersession chain — they re-render the existing capture’s body and update its payload.

Validate

Terminal window
khiipd validate

validate is a read-only consistency check between your Markdown vault (canonical) and the SQLite index (a derived cache) per ADR-0009 §C3. It never mutates either side. It checks four invariants:

  1. Vault ↔ SQLite reconciliation. Every index row has a matching vault file and vice versa — matched by the id: frontmatter key, not the filename. Catches a manual delete, a half-finished restore, or a vault file with no id:.
  2. Payload round-trip. Every vault file’s structured_payload deserializes back through the typed discriminated union — guards against schema drift between extractor versions.
  3. Status consistency. A success capture has no error recorded; a failed-permanent capture does (the failure reason is the only signal on a tombstone). Per ADR-0010.
  4. Media sandbox. Every media.local_path resolves to a real file inside the vault — no out-of-vault paths, no dangling references.

It exits 0 when everything passes and 1 when there are violations, printing a per-invariant summary:

checked 142 captures + 142 vault files
✓ all invariants pass

Add --json for a machine-readable report (handy in a backup-restore or CI script), or --vault-path / --db-path to point at a non-default location.

Run it after an upgrade, after restoring a backup, or any time the vault and index might have drifted.