Hacker News new | ask | show | jobs
by dredmorbius 4416 days ago
One thing I do have to give The New York Times credit for is that it's got an exceptionally good digital archive. All Web content ever posted is available online in full form.

Published articles at least through the early 20th century are indexed, typically with the lede paragraph or sentence. I'd love to have more, but that's a start.

1 comments

if they have Google's OCR tech, it would have been much better than it is. Wonder if Google ever thought about making a cloud OCR api product. it would align with their goals.
OCR isn't even necessary. There's also The Internet Archive's BookReader which I noted recently:

https://openlibrary.org/dev/docs/bookreader

GNU Affero Licence, on GitHub:

http://github.com/openlibrary/bookreader

Google gives away its OCR stack in the form of free software.