Hacker News new | ask | show | jobs
by macklinkachorn 496 days ago
In my previous role, I have experienced similar things where the rule-based parsing approach is really tricky to get right and often failed via from edge cases.

We (at https://runtrellis.com/) have been building PDF processing pipeline from the ground up with LLMs and VLMs and have seen close to 100% accuracy even for tricky PDFs. The key is to use rule based engine and references to cross check the data.