Series 1 · Wax explained 2 · Wax vs qmd 3 · The chunk→fact bridge
Part 3 of 3 · the build

The chunk→fact bridge

qmd's retrieval feeding Wax's structured memory. A tool that reads an org‑mode corpus — devlogs, notes, session journals — chunks it along its own structure, and distills durable, bitemporal, citable EAV facts back into Wax. Grounded in a real tree: ~/dev/org/{devlogs,notes} and ~/dev/my/claude-journal.

Cites: green = the new bridge · amber = Wax · blue = qmd.

01 What the corpus actually is

All org. Markdown is vestigial here. Three shapes, each already semantically segmented.

Sub‑corpusPath shapeStructure the bridge exploits
Devlogs (481)org/devlogs/<project>/<yyyy>/<mm>/<date>_<project>.org#+SOURCE-FINGERPRINT: (a SHA — free content hash), #+TITLE, #+FILETAGS: :daily:summary:<proj>:, then * Summary* Journal** HH:MM —**** Key Decisions with dense [[orgit-rev:…::<sha>]] commit links
Notes (93)org/notes/<yyyy>/<slug>.org:PROPERTIES: drawer (:ID:,:CREATED:), *Affects:*/*Severity:*, * Root cause** Layer N, #+begin_src/#+begin_example blocks
claude‑journal (249)my/claude-journal/<project>/<yyyy>/<mm>/<ts>_<project>.org#+PROJECT:, * Context / * User Context / * Reflections / * Observations** <obs>

The author already split this into Decisions / Observations / Root‑cause / Reflections — high signal‑to‑noise. And project + date + kind live in the path and headers: free entity and time metadata, no inference needed.

02 The insight: org structure is the chunker

qmd guesses structure from # density. Your org tree hands it over.

qmd's job (Markdown)

Score break points by regex (H1=100, code‑fence=80, HR=60, paragraph=20…) and search a ~200‑token window for the best cut with squared‑distance decay. store.ts:106–242 It's reconstructing structure the author left implicit.

The bridge's job (org)

Prefer whole heading subtrees — a **** Key Decisions block is a chunk. Only fall back to qmd‑style windowed cutoff inside an oversized subtree. Never split inside #+begin_…/#+end_… or :PROPERTIES:…:END: (the org analogue of qmd's code‑fence guard).

Two consequences worth their own line: the heading path tells you which section a chunk is in — so you distill only the high‑signal ones and skip Conversation Excerpts for free; and the directory names are your entity table (every project is a top‑level dir).

03 The org‑aware chunker

Subtree‑first, block‑aware. Edit the org below; tune the granularity. Pre‑filled with a (faithful) slice of a real invoicekit devlog.

distill high‑signal section · skip routed out by heading · block contains a no‑split region · oversize would get a windowed sub‑cut. Each chunk's provenance — file, project, date, heading path, char span — rides into Wax as metadata.

04 The bridge, end to end

Click a stage.

05 Chunk → fact: the mapping

Each extracted triple lands as a fact_assert with an sm_evidence row pointing back at the exact source span.

Extraction output→ WaxSource field
subject (canonical key)fact_assert.subjectresolved, see §06
predicate (controlled vocab)fact_assert.predicate
object (typed)fact_assert.objectint / string / entity‑ref
entry datevalid_from_ms#+DATE / path
extraction timesystem_from_msnow
chunk frame idsm_evidence.source_frame_idStructuredMemorySchema.swift:94
char span in chunkspan_start_utf8/_end_utf8:95–97
"org-bridge" + model idextractor_id/_version:99–100
LLM confidenceconfidence:102

Worked example — the Key Decisions chunk above

The bitemporal payoff for this corpus

valid_time = the day you wrote the devlog ("what was decided on 2026‑05‑01"); system_time = when the bridge learned it. So facts(about: project:invoicekit, asOf: <date>) reconstructs what you knew then, and re‑running with a better extractor opens new system‑time spans without losing history.

06 Identity — the part Wax won't do for you

Wax does zero co‑reference; assertFact binds subjects by exact key. So the bridge owns identity — but your corpus makes it easy.

Controlled predicate vocabulary (mandatory)

Wax interns predicates by exact key, so decided and made_decision would split into two. Lock a small set tuned to these logs; the extraction prompt maps prose onto it or drops the candidate.

07 Idempotency, the review gate, and the honest hard parts

What comes free
  • Re‑runs are safe: Wax dedups triples by SHA‑256(S,P,O) (UNIQUE(fact_hash)), so re‑extracting an unchanged chunk is a no‑op.
  • Skip unchanged files via #+SOURCE-FINGERPRINT / content hash.
  • Edits supersede: changed facts assert with version_relation:updates → the old span closes, history preserved.
  • Review gate: route sub‑threshold facts to a checklist before they go durable — Wax's DREAMS.md promotion flow is the model AgentBrokerService+Markdown.swift:168.
Decisions for you
  • Near‑dup objects: if the LLM rewords an object on re‑run, the triple hash differs → near‑duplicate. Mitigate by canonicalizing objects, or dedup on (subject, predicate, source‑span) instead of object text.
  • Predicate governance: closed vocab (~15) vs open‑world. Start closed, grow deliberately.
  • Extractor model: qmd's qwen3 models are retrieval‑tuned, not extraction‑tuned. A small instruct model (or your gptel setup) prompted with heading context + the vocab is the realistic path.
  • Review gate is Markdown‑only in Wax — for an org‑first flow you'd accept a Markdown review file, or have the bridge manage its own org/.fact-review.org checklist.
Pragmatic build

Given org‑first + your Emacs/Babashka stack, the realistic shape is a Babashka/Clojure (or TS) orchestrator: org‑parse → subtree chunk → extract → drive Wax via its CLI/daemon (fact_assert already takes an evidence arg AgentBrokerService.swift:1187). No Swift needed unless you want the custom‑embedder path from Part 2. First step: a 50‑file dry run over org/devlogs/invoicekit — the extracted facts will tell you fast whether they're worth keeping.