| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by coroxout 4200 days ago
	It's precisely those idiosyncrasies of early modern orthography which make it difficult to use an off-the-shelf OCR package, which is presumably why these are hand-transcribed instead. Perhaps there is a specialist antiquarian OCR package which can deal with long s, interchangeable u and v, non-standardised spelling, etc, but I have yet to come across one.

1 comments

acdha 4200 days ago

Have you looked at The Early Modern OCR project? My understanding is that they're working on exactly that as well as simply better tools for reviewing & retraining on a large scale:

http://emop.tamu.edu/

link

coroxout 4199 days ago

No, I hadn't, and am grateful for the link - thank you!

link