I also want to emphasize the importance of listening (validating) as well as recording. Validation is an big part of the puzzle for building machine learning viable data.
One thing that wasn't entirely clear to me is how strict you have to be when validating? i.e. I encountered one recording that was completely silent - I figured that had to be marked as invalid. However, another one was barely audible, but by intently listening I did recognise it pronounced the right words - is that OK?
And should we validate whether they match the proper accents as well? e.g. if I hear a clear Dutch accent, I presume you wouldn't want that labelled "native British speaker"?
Can I suggest is encourging user's to get recordings from their children as well, as most speech recognition libraries are pretty poor with children's voices. (IMO Alexa Voice Service is by far the best with children voices.)
And should we validate whether they match the proper accents as well? e.g. if I hear a clear Dutch accent, I presume you wouldn't want that labelled "native British speaker"?