ContextDistill
A modular CLI pipeline that converts DOCX, PDF, HTML, and URLs into clean Markdown optimized for LLM consumption.
Features
- Multi-format input (DOCX, PDF, HTML, URL)
- OCR via local models (LM Studio/Ollama)
- Boilerplate removal with trafilatura
- SSE streaming for progress feedback
- Protocol-based architecture for extensibility