| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nxrabl 360 days ago
	Very interesting! Is this the state of the art for accurate OCR of tabular PDFs, or is there other work in the space to compare against?

1 comments

SnooSux 360 days ago

There's lots of posts on HN for developments and companies doing OCR and Document Extraction. It's a classic CV problem but still has come a long way in the past couple years

link

dwillis 360 days ago

Yeah, this is a very well-traveled road, but LLMs have made some big improvements. If you asked me (the guy who wrote the original piece linked above) what I'd use if accuracy alone was the goal, probably would be AWS Textract. But accuracy and structure? Gemini.

link