Hacker News new | ask | show | jobs
by advisedwang 471 days ago
A lot of times you are OCRing documents from people who do not care about how easy it is for the reader to extract data. A common example is regulatory filings - the goal is to comply with the law, not help people read your data. Or perhaps it's from a source that sells the data or has copyright and doesn't want to make it easy for other people to use in ways besides their intention. etc.