Hacker News new | ask | show | jobs
Show HN: Side-by-side PDF parser comparison for RAG pipelines (github.com)
2 points by 2dogsanerd 206 days ago
A simple tool to compare how different PDF parsers handle your documents.

Shows naive parsing (pypdf) vs layout-aware parsing (Docling) side-by-side.

Helps spot issues with scans, tables, and multi-column layouts before theycause problems in your RAG system.

Parsers are easy to swap if you want to try alternatives.

1 comments

Poppler is also good