Substrate overview

Khiip is one canonical substrate with many surfaces. Capture flows through three tiers, and consumers read whichever tier they need.

Three tiers

Source tier — the raw bytes as fetched (HTML, JSON, media), gzipped and preserved under your configurable data_root. This is insurance against upstream rot: if a page changes or disappears, you still have what you captured.
Payload tier — a Pydantic-typed payload emitted at the extractor boundary, plus the rendered Markdown + YAML frontmatter that is the canonical source of truth in your vault.
Render tier — a Renderer Protocol + Registry that turns a payload into whatever a given consumer wants: per-source Markdown, legacy Markdown, JSON, or vault frontmatter.

This separation is what makes Khiip the layer, not the destination: the typed payload is captured once, and any number of renderings derive from it.

The `?format=` selector

The REST surface exposes the render tier directly. GET /api/v1/captures/{id}?format=… returns the same capture as capture JSON, payload JSON, vault Markdown, or legacy Markdown — so an agent and a human reader can each ask for the shape they want.

Storage

Vault: Markdown + YAML frontmatter at ~/khiip-vault/captures/<source>/ — canonical, human- and tool-readable, grep-able.
SQLite index: a derived cache at ~/.local/share/khiip/index.db for fast listing, recall, and the knowledge graph. khiipd validate checks the two stay consistent.

Bitemporal model

Every capture records both recorded_at (when Khiip fetched it) and valid_from (when the data was true in the world). Point-in-time queries can answer “what did this say on date X.” Corrections are append-only — new captures supersede old ones rather than overwriting them.

Where to go next

Typed payloads — the per-source payload models
Knowledge graph — typed edges over captures
Failure handling (P-δ) — capture-what-we-can