Hacker News new | ask | show | jobs
by eserorg 6060 days ago
Sweet!

I'm hacking together a perl script right now using the Mechanical Turk API.

It's automatically spliting up the mp3 files, generating the html forms, and loading them into MTurk.

I'm going to use a 2x coverage for each chunk and see what happens.

This is a brilliant idea. Thank you for the suggestion! I can't believe it didn't occur to me before -- and we are _very_ heavy users of AWS.

2 comments

We had initially considered using MTurk as the backend for our transcription service. But found it difficult to tailor to the transcription process we had in mind.

The system we use now has multiple stages. We split up the files into smaller chunks which are then picked up by our transcribers. Each transcript is then reviewed, speaker initials and timestamps are added and then they are finally collated.

We've gotten pretty decent results with our system so far with some very satisfied customers.

More about our process at http://callgraph.biz/transcriptionservice#process

So CastingWords has obviously made MTurk work and work well, but finding ways to maintain quality levels and consistency has not been an easy process. We've had to give up completely on "quick and dirty" (IE super-cheap) transcription. Instead we focus on a much higher quality product. And maintain quality by using with many many (mainly QA) steps - the shortest, simplest transcript is seen by 14 turkers - and runs through a 7 step process. And with added audio length the pipeline gets even longer.