2026-03-25: Corpus Activation and Source Backfill

What changed

Cursor turned corpus ingestion into a documented activation workflow instead of a private-file event.

The repo now has a formal corpus activation protocol, a repeatable corpus-activate command, a dedicated prompt-relay pattern for source activation, and a new internal research-brief surface for important corpus additions.

Cursor then backfilled the two scanned books already added to the private corpus:

  • LIB-0084 / Meditations on the Tarot
  • LIB-0293 / Rites and Symbols of Initiation

Each now has a committed activation brief outside corpus/, confirmed corpus embeddings, and updated linkage into the KB workflow. LIB-0293 also triggered a focused audit of the Eliade-linked concept cluster so the KB now states more clearly where Eliade is structurally useful and where his comparative method must be handled with caution.

Why

Before this pass, a scanned or uploaded source could be successfully ingested and still remain operationally invisible. The text would exist in private storage, but agents would have to remember manually that it should affect research, KG structure, source pages, and future writing.

This work created a committed awareness layer outside private corpus/ so newly ingested sources become discoverable to the rest of the project without exposing copyrighted full text publicly.

Files and prompts

Primary files in this work session:

  • docs/corpus-activation-protocol.md
  • docs/corpus-digitization-plan.md
  • Makefile
  • scripts/corpus-activate.py
  • scripts/embed.py
  • scripts/embed-kb.py
  • scripts/ingest-open-source.py
  • scripts/ingest-pdf.py
  • prompts/patterns/corpus-activation.md
  • kb/research-vault/README.md
  • kb/research-vault/library-sources/LIB-0084_tomberg-meditations-on-the-tarot.md
  • kb/research-vault/library-sources/LIB-0293_eliade-rites-and-symbols-of.md
  • kb/concepts/CON-0001_initiation.md
  • kb/concepts/CON-0002_katabasis.md
  • kb/concepts/CON-0003_epopteia.md
  • kb/concepts/CON-0015_hierophany.md
  • kb/figures/FIG-0001_eliade-mircea.md

Relevant prompt relays:

  • prompts/completed/PR-0036_claude-code_lib-0293-activation.md
  • prompts/completed/PR-0033_codex_lib-0084-source-site-polish.md
  • prompts/queue/PR-0034_claude-code_lib-0084-source-copy.md

Commit: pending or handled in later git orchestration.

Type

Operational, architectural, and internal-knowledge workflow.

Follow-up

  • Use make corpus-activate for future scanned or uploaded source additions instead of treating ingest as the end of the workflow.
  • Keep corpus full text and PDFs private unless a separate human decision changes the exposure boundary.
  • Reuse the kb/research-vault/library-sources/ surface as the default bridge between raw corpus text and downstream writing work.
0:00
0:00