Hacker News new | ask | show | jobs
by nxrabl 360 days ago
Very interesting! Is this the state of the art for accurate OCR of tabular PDFs, or is there other work in the space to compare against?
1 comments

There's lots of posts on HN for developments and companies doing OCR and Document Extraction. It's a classic CV problem but still has come a long way in the past couple years
Yeah, this is a very well-traveled road, but LLMs have made some big improvements. If you asked me (the guy who wrote the original piece linked above) what I'd use if accuracy alone was the goal, probably would be AWS Textract. But accuracy and structure? Gemini.