| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by punchingwater 2905 days ago

In the early days of this project, before we shipped the website (ie. ~March of 2017), we did some explorations around Mechanical Turk. The problem with the Mech Turk approach is that for recording your voices you need a lot of different people speaking (ie. 10s of thousands). But for languages other than English, Mech Turk simply doesn't have these kind of numbers. And indeed English is not that interesting to us, since there exists public data already in English (see LibriSpeech). There are of course other micro-task platforms popular in other countries (for instance, there's a myriad in Indonesia), but we didn't have the time to manage jobs on all these different platforms.

However, Mech Turk is better for things like validation, since you only need a handful of people doing the majority of work.

In any case, I have some very hacky tools we used for this exploration, if you are interested: https://github.com/mikehenrty/mech-turk/