| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by punchingwater 3123 days ago
	Thank you so much! I also want to emphasize the importance of listening (validating) as well as recording. Validation is an big part of the puzzle for building machine learning viable data.

2 comments

Vinnl 3122 days ago

One thing that wasn't entirely clear to me is how strict you have to be when validating? i.e. I encountered one recording that was completely silent - I figured that had to be marked as invalid. However, another one was barely audible, but by intently listening I did recognise it pronounced the right words - is that OK?

And should we validate whether they match the proper accents as well? e.g. if I hear a clear Dutch accent, I presume you wouldn't want that labelled "native British speaker"?

link

RubenSandwich 3123 days ago

Can I suggest is encourging user's to get recordings from their children as well, as most speech recognition libraries are pretty poor with children's voices. (IMO Alexa Voice Service is by far the best with children voices.)

link

yjftsjthsd-h 3123 days ago

Is that okay legally? Maybe parental permission is enough

link