|
|
|
|
|
by danso
2758 days ago
|
|
Even if AWS goes the cynical route of making Textract be an upsell to MTurk -- e.g. the Textract output is not reliable enough on its own, but structured for easy piping to a MTurk job -- that's got to be useful for the many folks who send entire pages to MTurk when they just need a couple boxes proofread. As an example of a more scripted/structured job, ProPublica built out a crowdsourcing framework in Rails to extract data from FCC filings. But even that was quite difficult, because every state/TV station has its own kind of form: https://projects.propublica.org/free-the-files/ |
|