Hacker News new | ask | show | jobs
by coroxout 4153 days ago
It's precisely those idiosyncrasies of early modern orthography which make it difficult to use an off-the-shelf OCR package, which is presumably why these are hand-transcribed instead.

Perhaps there is a specialist antiquarian OCR package which can deal with long s, interchangeable u and v, non-standardised spelling, etc, but I have yet to come across one.

1 comments

Have you looked at The Early Modern OCR project? My understanding is that they're working on exactly that as well as simply better tools for reviewing & retraining on a large scale:

http://emop.tamu.edu/

No, I hadn't, and am grateful for the link - thank you!