Skip to content

Substrate overview

Khiip is one canonical substrate with many surfaces. Capture flows through three tiers, and consumers read whichever tier they need.

Three tiers

  1. Source tier — the raw bytes as fetched (HTML, JSON, media), gzipped and preserved under your configurable data_root. This is insurance against upstream rot: if a page changes or disappears, you still have what you captured.
  2. Payload tier — a Pydantic-typed payload emitted at the extractor boundary, plus the rendered Markdown + YAML frontmatter that is the canonical source of truth in your vault.
  3. Render tier — a Renderer Protocol + Registry that turns a payload into whatever a given consumer wants: per-source Markdown, legacy Markdown, JSON, or vault frontmatter.

This separation is what makes Khiip the layer, not the destination: the typed payload is captured once, and any number of renderings derive from it.

The ?format= selector

The REST surface exposes the render tier directly. GET /api/v1/captures/{id}?format=… returns the same capture as capture JSON, payload JSON, vault Markdown, or legacy Markdown — so an agent and a human reader can each ask for the shape they want.

Storage

  • Vault: Markdown + YAML frontmatter at ~/khiip-vault/captures/<source>/ — canonical, human- and tool-readable, grep-able.
  • SQLite index: a derived cache at ~/.local/share/khiip/index.db for fast listing, recall, and the knowledge graph. khiipd validate checks the two stay consistent.

Bitemporal model

Every capture records both recorded_at (when Khiip fetched it) and valid_from (when the data was true in the world). Point-in-time queries can answer “what did this say on date X.” Corrections are append-only — new captures supersede old ones rather than overwriting them.

Where to go next