2026-03-25: Corpus Activation and Source Backfill
What changed
Cursor turned corpus ingestion into a documented activation workflow instead of a private-file event.
The repo now has a formal corpus activation protocol, a repeatable corpus-activate command, a
dedicated prompt-relay pattern for source activation, and a new internal research-brief surface for
important corpus additions.
Cursor then backfilled the two scanned books already added to the private corpus:
LIB-0084/ Meditations on the TarotLIB-0293/ Rites and Symbols of Initiation
Each now has a committed activation brief outside corpus/, confirmed corpus embeddings, and
updated linkage into the KB workflow. LIB-0293 also triggered a focused audit of the Eliade-linked
concept cluster so the KB now states more clearly where Eliade is structurally useful and where his
comparative method must be handled with caution.
Why
Before this pass, a scanned or uploaded source could be successfully ingested and still remain operationally invisible. The text would exist in private storage, but agents would have to remember manually that it should affect research, KG structure, source pages, and future writing.
This work created a committed awareness layer outside private corpus/ so newly ingested sources
become discoverable to the rest of the project without exposing copyrighted full text publicly.
Files and prompts
Primary files in this work session:
docs/corpus-activation-protocol.mddocs/corpus-digitization-plan.mdMakefilescripts/corpus-activate.pyscripts/embed.pyscripts/embed-kb.pyscripts/ingest-open-source.pyscripts/ingest-pdf.pyprompts/patterns/corpus-activation.mdkb/research-vault/README.mdkb/research-vault/library-sources/LIB-0084_tomberg-meditations-on-the-tarot.mdkb/research-vault/library-sources/LIB-0293_eliade-rites-and-symbols-of.mdkb/concepts/CON-0001_initiation.mdkb/concepts/CON-0002_katabasis.mdkb/concepts/CON-0003_epopteia.mdkb/concepts/CON-0015_hierophany.mdkb/figures/FIG-0001_eliade-mircea.md
Relevant prompt relays:
prompts/completed/PR-0036_claude-code_lib-0293-activation.mdprompts/completed/PR-0033_codex_lib-0084-source-site-polish.mdprompts/queue/PR-0034_claude-code_lib-0084-source-copy.md
Commit: pending or handled in later git orchestration.
Type
Operational, architectural, and internal-knowledge workflow.
Follow-up
- Use
make corpus-activatefor future scanned or uploaded source additions instead of treating ingest as the end of the workflow. - Keep corpus full text and PDFs private unless a separate human decision changes the exposure boundary.
- Reuse the
kb/research-vault/library-sources/surface as the default bridge between raw corpus text and downstream writing work.