|
|
|
|
|
by api
3969 days ago
|
|
... on a Pentium I, using 1990s machine learning algorithms, sure. Nobody's answered my question as to why The Cloud is the magic pixie dust that solves this problem, and why it could not be solved locally with modern compute power and modern ML techniques. |
|
Firstly, the models (particularly the language models) needed for state of the art performance are huge. It's not atypical for papers to discuss using a billion n-grams, for example ( https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenTerm1201415/s... ). That's several gigabytes of memory and storage at the very least, and you'd need a copy of that for every spoken language you'd want to support. Plus you need to keep that up to date with new words and phrases; it's much easier to keep models fresh on a server than on everyone's computer.
Power and CPU time are also a concern. Big beefy server farms can have trouble keeping up with state of the art speech recognition algorithms; a laptop, tablet or phone is going to struggle, especially when running off a battery, is at a huge disadvantage.
But the biggest advantage to server-based speech recognition is indeed that more data is critical to improving accuracy and performance. There's no data like more data. And you don't just need more data, you need a lot more data. You can get big gains from just doing unsupervised training on 20 million utterance rather than 2 million: http://static.googleusercontent.com/media/research.google.co... There's simply no way you're going to get anything like 20 million utterances without getting data from millions of real world users.