|
|
|
|
|
by punchingwater
3261 days ago
|
|
I can tell from your comment (and it's responses) that the language on our homepage is a bit confusing, so thank you for the feedback. To answer you question: Common Voice is about building a collection of labelled voice data (ie. sentence clips w/ transcripts) that can be used to, for instance, train speech-to-text algorithms. Part of the goals of this project though is to figure out how this data can best help people build voice technology. So it's pretty open ended at this point. Mozilla does have an open source speech-to-text engine [1] we are developing, and we hope one day to use the Common Voice data to train this engine. DeepSpeech and Common Voice are related, but separate projects, if that makes sense. As for LibriSpeech, the DeepSpeech team at Mozilla does use this data for training. However, the language is pretty antiquated, and we only get about 1K hours of data, whereas you need about 10K hours to get to a decent accuracy (WER of 10% and below). Common Voice is about adding to public corpora like LibraSpeech, not replacing them. 1.) https://github.com/mozilla/DeepSpeech |
|
I would also not use voice technology as the generic term for speech recognition, text-to-speech, and whatever else you want to do with this data. Rather, speech technology is the common term to cover all of this (https://en.wikipedia.org/wiki/Speech_technology).