After battling Python dependencies, slow processing, and deployment headaches with tools like unstructured, I finally snapped—and built Ferrules, a blazing-fast document parser in Rust.
Why Ferrules?
- Speed – Native PDF parsing (pdfium), hardware-accelerated ML inference
- No Python – Single binary, zero dependency deployment, built-in tracing
- Smart Processing – Layout detection, OCR, intelligent element merging
- Flexible Outputs – JSON, HTML, Markdown (ideal for RAG pipelines)
Tech Highlights
- Runs layout detection on Apple Neural Engine/GPU
- Apple Vision API for high-quality OCR (macOS)
- Multithreaded, CLI + HTTP API server
- Debug mode with visual parsing output
If you're tired of Python-based parsers in production, check it out !
(P.S. Named after those metal rings on pencils—because it keeps your documents structured )
Why Ferrules? - Speed – Native PDF parsing (pdfium), hardware-accelerated ML inference - No Python – Single binary, zero dependency deployment, built-in tracing - Smart Processing – Layout detection, OCR, intelligent element merging - Flexible Outputs – JSON, HTML, Markdown (ideal for RAG pipelines)
Tech Highlights - Runs layout detection on Apple Neural Engine/GPU - Apple Vision API for high-quality OCR (macOS) - Multithreaded, CLI + HTTP API server - Debug mode with visual parsing output
If you're tired of Python-based parsers in production, check it out !
(P.S. Named after those metal rings on pencils—because it keeps your documents structured )