Typed payloads
Every source emits a Pydantic-typed payload at the extractor boundary — not loose HTML or a generic blob. This is what makes recall composition, per-consumer rendering, and the knowledge graph possible: structure is captured once and reused everywhere.
The payloads
The payloads form a discriminated union on kind, so a consumer can switch on one field:
| Payload | kind | Source |
|---|---|---|
TweetPayload | "x" | X (tweets, QRT chains, X-Articles) |
RedditPayload | "reddit" | Reddit (post + comment tree) |
WikiPayload | "wiki" | Wikipedia |
WebPayload | "web" | Generic web articles |
YouTubePayload | "youtube" | YouTube (metadata + transcript) |
PDFPayload | "pdf" | PDFs |
Shared primitives
Cross-platform building blocks are reused across payloads so the same concept has the same shape everywhere:
EngagementCounts— likes / reposts / replies / etc. (with a cross-platformnet_scorefor Reddit/HN/SO-style up-minus-down)UrlEntity— a resolved link with its display + expanded formsCommentNode— a node in a recursive comment tree (Reddit threads nest these)- media attachments referenced from the payload
X-specific structure
X carries richer structure than a flat tweet, so it has dedicated models:
XArticle— a long-form X Article, block-structuredXArticleBlock— an individual block (paragraph, heading, list, table, …)
Why typed
- Recall composes an embed-text per source from the typed fields, not a raw dump (see Recall).
- Rendering is deterministic from the payload — Markdown, JSON, or future HTML are all derived views (see Substrate overview).
- The Obsidian plugin’s
types.tsmirrors these models additively, staying in lock-step with the daemon.
The authoritative definitions live in the daemon’s models.py; the design rationale is in
ADR-0009.