Substrate overview
Khiip is one canonical substrate with many surfaces. Capture flows through three tiers, and consumers read whichever tier they need.
Three tiers
- Source tier — the raw bytes as fetched (HTML, JSON, media), gzipped and
preserved under your configurable
data_root. This is insurance against upstream rot: if a page changes or disappears, you still have what you captured. - Payload tier — a Pydantic-typed payload emitted at the extractor boundary, plus the rendered Markdown + YAML frontmatter that is the canonical source of truth in your vault.
- Render tier — a Renderer Protocol + Registry that turns a payload into whatever a given consumer wants: per-source Markdown, legacy Markdown, JSON, or vault frontmatter.
This separation is what makes Khiip the layer, not the destination: the typed payload is captured once, and any number of renderings derive from it.
The ?format= selector
The REST surface exposes the render tier directly. GET /api/v1/captures/{id}?format=…
returns the same capture as capture JSON, payload JSON, vault Markdown, or legacy
Markdown — so an agent and a human reader can each ask for the shape they want.
Storage
- Vault: Markdown + YAML frontmatter at
~/khiip-vault/captures/<source>/— canonical, human- and tool-readable,grep-able. - SQLite index: a derived cache at
~/.local/share/khiip/index.dbfor fast listing, recall, and the knowledge graph.khiipd validatechecks the two stay consistent.
Bitemporal model
Every capture records both recorded_at (when Khiip fetched it) and valid_from
(when the data was true in the world). Point-in-time queries can answer “what did this
say on date X.” Corrections are append-only — new captures supersede old ones rather
than overwriting them.
Where to go next
- Typed payloads — the per-source payload models
- Knowledge graph — typed edges over captures
- Failure handling (P-δ) — capture-what-we-can