Hacker News new | ask | show | jobs
by revelation 3397 days ago
It's basically all typeprinter font and expertly scanned, what exactly is the issue in using even basic OCR?
2 comments

The Smithsonian is transcribing other, much more difficult works. Such as the cursive lab notebook of a historic astrophysicist[0]. I am ridiculously jealous that they are getting this sort of crowd-sourced help to clean data.

Compare to hampanda.com (from Deepgram, YC W16)

[0] https://transcription.si.edu/transcribe/8634/ECOFD

I was wondering this too, there are typographic errors on the cards that are being transcribed verbatim one

"What is the different between a blond and a bruck[sic]?"

"After you lay a brick it does'nt[sic] follow you around for a week."

From the perspective of the artifact I could imagine that having the typos there would be reasonable but from the perspective of searchability it doesn't make a lot of sense to me.