| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by macklinkachorn 496 days ago
	In my previous role, I have experienced similar things where the rule-based parsing approach is really tricky to get right and often failed via from edge cases. We (at https://runtrellis.com/) have been building PDF processing pipeline from the ground up with LLMs and VLMs and have seen close to 100% accuracy even for tricky PDFs. The key is to use rule based engine and references to cross check the data.