Back to Projects

ContextDistill

A modular CLI pipeline that converts DOCX, PDF, HTML, and URLs into clean Markdown optimized for LLM consumption.

Features

  • Multi-format input (DOCX, PDF, HTML, URL)
  • OCR via local models (LM Studio/Ollama)
  • Boilerplate removal with trafilatura
  • SSE streaming for progress feedback
  • Protocol-based architecture for extensibility