|
|
|
|
|
by pbhjpbhj
1 hour ago
|
|
You almost don't want [super-]word level ML (ie word-pair/phrase/sentence/document/corpus level). In transcription, you want near certainty, or you want marking that the word could not be read with certainty - yes, context lets you guess, but you want - for some OCR - to know when it's a guess based on other than the letters in order forming a word. Example, in a census document on familysearch.com the transcriber "corrected" a name as Joseph. The literal letters in the handwritten document spell Josepth ... and sure enough that's a local variant spelling (Eire). In another document the writer has used "Joh" as an abbreviation, a [human, I assume] transcriber put that as John ... which is most likely, but happens to be wrong. Sometimes you care that it's guessed, sometimes you want just the best guess. |
|
A nitpick, because it's often a dogwhistle: but Almost nobody in Ireland calls it that when speaking English. And that's still incorrect in Irish, the correct spelling is Éire.