Hacker News new | ask | show | jobs
by vissidarte_choi 816 days ago
This is because PDF has so many different versions. A third-party tools like pdfplumber won't fit it all. For example, using pdfplumber to parse some PDFs will cause the system to raise exceptions. Sometimes fitz works in situations where pdfplumber won't handle well. It looks a bit complicated, but RAGFlow is using multiple parsing tools to handle different types of PDFs.