Hacker News new | ask | show | jobs
by philers 4153 days ago
In fact, those mistakes look more like accurate transcriptions of Early Modern manuscripts - with their looser spelling rules and often idiosyncratic use of letters.

It's kind of interesting that they look like the same errors as those generated by OCR.

The difficulty of deciphering the text makes this huge task even more impressive!

1 comments

It's precisely those idiosyncrasies of early modern orthography which make it difficult to use an off-the-shelf OCR package, which is presumably why these are hand-transcribed instead.

Perhaps there is a specialist antiquarian OCR package which can deal with long s, interchangeable u and v, non-standardised spelling, etc, but I have yet to come across one.

Have you looked at The Early Modern OCR project? My understanding is that they're working on exactly that as well as simply better tools for reviewing & retraining on a large scale:

http://emop.tamu.edu/

No, I hadn't, and am grateful for the link - thank you!