Raw files are transientDerived artifacts persist0 assets staged locally

Ingestion pipeline

The app server stays CPU-only. It parses, chunks, ranks, and records provenance, while paid model APIs are reached through LiteLLM only.

01

Transient upload

Accept teacher files long enough to parse them. Raw bytes are not a permanent asset class in this system.

02

Structured parse

Extract page, slide, section, transcript, and layout data first. OCR is the fallback, not the primary path.

03

Chunk and index

Create retrieval chunks with lexical and dense search surfaces, then attach concept metadata and citation offsets.

04

Approve the brain

Teachers review the ingestion summary before the course brain becomes the active retrieval source.

Persistent artifacts

asset metadata
page or slide text
layout JSON
chunk embeddings
citation offsets
optional cited-page thumbnails

Guardrails

Never keep raw files by default after successful parsing.
Never mix chunks from different course spaces in retrieval.
Never let the frontend call model vendors directly.
Always capture enough provenance to show page, slide, or timestamp-level citations.

Live course asset register

What has already been staged in local development

0 pending parse
No course exists yet. Create a course on the dashboard and stage the first asset there.