The Smithsonian is transcribing other, much more difficult works. Such as the cursive lab notebook of a historic astrophysicist[0]. I am ridiculously jealous that they are getting this sort of crowd-sourced help to clean data.
I was wondering this too, there are typographic errors on the cards that are being transcribed verbatim one
"What is the different between a blond and a bruck[sic]?"
"After you lay a brick it does'nt[sic] follow you around for a week."
From the perspective of the artifact I could imagine that having the typos there would be reasonable but from the perspective of searchability it doesn't make a lot of sense to me.
Interesting project, but so many design problems with the approach to involve users.
Use OCR first, then use humans to verify.
Next, present a task right up front that anyone can help with -- draw people right in. Don't make users "look for work" and minimize/eliminate the need for training.
For example: "If there's a date shown, enter it here ______" (with an option for "no date").
Or, "Correct this text as it appears on the card: ________"
Or, "Is there an attribution/credit mentioned? If so, enter it here ____________________"
If you're ever in Washington D.C. you can view Bob Hope's joke file at the Library of Congress where there's a special exhibit on him.
Hope's career started in Vaudeville, then radio, the movies and finally TV. Interestingly he did several movies with Phyllis Diller and she was on a lot of his TV specials.