Hacker News new | ask | show | jobs
by throwaway4496 325 days ago
Yes, and don't for a second think this approach of rastering and OCR'ing is sane, let alone a reasonable choice. It is outright absurd.
1 comments

Noone has claimed getting structured data out of pdfs are sane. What you seem to be missing is that there are no sane ways to get a decent output. The reasonable choice would be to not even try, but business needs invalidate that choice. So what remain is the absurd ways to solve the problem.