|
|
|
|
|
by pmarreck
2 hours ago
|
|
my attempts at using AI to do OCR have always resulted in invented artifacts, which is not production feasible. does this suffer from that as well? A simple example is words that are supposed to be in other languages being automatically translated to English, which ruins the effect |
|
In transcription, you want near certainty, or you want marking that the word could not be read with certainty - yes, context lets you guess, but you want - for some OCR - to know when it's a guess based on other than the letters in order forming a word.
Example, in a census document on familysearch.com the transcriber "corrected" a name as Joseph. The literal letters in the handwritten document spell Josepth ... and sure enough that's a local variant spelling (Eire).
In another document the writer has used "Joh" as an abbreviation, a [human, I assume] transcriber put that as John ... which is most likely, but happens to be wrong.
Sometimes you care that it's guessed, sometimes you want just the best guess.