|
|
|
|
|
by abakker
2671 days ago
|
|
But, if I understand correctly, systems can be trained separately on "this is background noise" and then apply those filters first, and then work with cleaned audio, right? I've been using krisp.ai for a few weeks and it has been fantastic at doing exactly that in real-time. Regarding conversational speech, I get that. Books are definitely not conversational. I guess the next question though, would be: is the objective to build a model that understands all words, or conversational speech? <novice> It seems like transfer learning on a model trained on audiobooks and then conversations would still be a good path, right? </novice> |
|
In any case, for read speech in particular there are several corpora out there already, including the moderately large LibriSpeech corpus (1000hr). The state-of-the-art accuracy on read speech is also very good -- for example, domain-specific dictation systems have been commercially viable for quite some time. So while it's true that Audiobooks are a large untapped source, I think that there are other large-scale and richer options like YouTube or movies (i.e. videos with speech for which subtitles are available) that would be more useful to make progress towards good speech recognition systems.