Raw files are transientDerived artifacts persist0 assets staged locally
Ingestion pipeline
The app server stays CPU-only. It parses, chunks, ranks, and records provenance, while paid model APIs are reached through LiteLLM only.
01
Transient upload
Accept teacher files long enough to parse them. Raw bytes are not a permanent asset class in this system.
02
Structured parse
Extract page, slide, section, transcript, and layout data first. OCR is the fallback, not the primary path.
03
Chunk and index
Create retrieval chunks with lexical and dense search surfaces, then attach concept metadata and citation offsets.
04
Approve the brain
Teachers review the ingestion summary before the course brain becomes the active retrieval source.
Persistent artifacts
asset metadata
page or slide text
layout JSON
chunk embeddings
citation offsets
optional cited-page thumbnails
Guardrails
Never keep raw files by default after successful parsing.
Never mix chunks from different course spaces in retrieval.
Never let the frontend call model vendors directly.
Always capture enough provenance to show page, slide, or timestamp-level citations.
Live course asset register
What has already been staged in local development
No course exists yet. Create a course on the dashboard and stage the first asset there.