Hacker News new | ask | show | jobs
Ferrules: A fast document parser written in Rust (github.com)
1 points by amindiro 470 days ago
1 comments

After battling Python dependencies, slow processing, and deployment headaches with tools like unstructured, I finally snapped—and built Ferrules, a blazing-fast document parser in Rust.

Why Ferrules? - Speed – Native PDF parsing (pdfium), hardware-accelerated ML inference - No Python – Single binary, zero dependency deployment, built-in tracing - Smart Processing – Layout detection, OCR, intelligent element merging - Flexible Outputs – JSON, HTML, Markdown (ideal for RAG pipelines)

Tech Highlights - Runs layout detection on Apple Neural Engine/GPU - Apple Vision API for high-quality OCR (macOS) - Multithreaded, CLI + HTTP API server - Debug mode with visual parsing output

If you're tired of Python-based parsers in production, check it out !

(P.S. Named after those metal rings on pencils—because it keeps your documents structured )