Hacker News new | ask | show | jobs
by epaga 878 days ago
This looks like it could be very helpful. The company I work for has a PDF comparison tool called "PDFC" which can read PDFs and runs comparisons of semantic differences. https://www.inetsoftware.de/products/pdf-content-comparer

Parsing PDFs can be quite the headache because the format is so complex. We support most of these features already but there are always so many edge cases that additional angles can be very helpful.

1 comments

You're absolutely right; parsing PDFs can be a real headache due to their inherent complexity. The format itself can vary in structure, layout, and embedded components, making it difficult to extract and compare information consistently. Even with robust tools like PDFC, edge cases can always emerge, requiring further refinements.