Hacker News new | ask | show | jobs
by punchingwater 3123 days ago
Thank you so much!

I also want to emphasize the importance of listening (validating) as well as recording. Validation is an big part of the puzzle for building machine learning viable data.

2 comments

One thing that wasn't entirely clear to me is how strict you have to be when validating? i.e. I encountered one recording that was completely silent - I figured that had to be marked as invalid. However, another one was barely audible, but by intently listening I did recognise it pronounced the right words - is that OK?

And should we validate whether they match the proper accents as well? e.g. if I hear a clear Dutch accent, I presume you wouldn't want that labelled "native British speaker"?

Can I suggest is encourging user's to get recordings from their children as well, as most speech recognition libraries are pretty poor with children's voices. (IMO Alexa Voice Service is by far the best with children voices.)
Is that okay legally? Maybe parental permission is enough