|
|
|
|
|
by computerex
2743 days ago
|
|
The built in offline Android speech recognizer is really bad. Giants like Google and Facebook are blessed with data, and so they can train state of the art speech recognition models (much much better than what you get out of the built in Android recognizer) and then provide speech recognition as a service. They can control the recognition because it happens on their servers and is independent of Android or any other OS. And so FB for instance can send some voice data to their servers and get a text output. And then FB can use text sentiment analysis to get further context about the message. Sadly, most people don't have the speech data to train their own recognizers on large vocabulary systems, and that's even harder for languages that are not English. With exception of Google/Amazon/FB/Microsoft/Baidu/etc other people have to use the API's offered by the above companies to do high fidelity recognition. Which sucks because there is a cost to each recognition. You have to pay someone else to do it. Whereas FB/Amazon/MS/Baidu/etc can do high fidelity recognition offline on large vocabulary and offer it as a service. THIS is why FB wants to make speech recognition systems. |
|
I wonder if you could bootstrap a sizable speech dataset by trawling audio off YouTube and then using one of the really good cloud speech recognition services to label it. :)