Hacker News new | ask | show | jobs
by gok 3974 days ago
There are several tremendous advantages to server-based speech recognition.

Firstly, the models (particularly the language models) needed for state of the art performance are huge. It's not atypical for papers to discuss using a billion n-grams, for example ( https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenTerm1201415/s... ). That's several gigabytes of memory and storage at the very least, and you'd need a copy of that for every spoken language you'd want to support. Plus you need to keep that up to date with new words and phrases; it's much easier to keep models fresh on a server than on everyone's computer.

Power and CPU time are also a concern. Big beefy server farms can have trouble keeping up with state of the art speech recognition algorithms; a laptop, tablet or phone is going to struggle, especially when running off a battery, is at a huge disadvantage.

But the biggest advantage to server-based speech recognition is indeed that more data is critical to improving accuracy and performance. There's no data like more data. And you don't just need more data, you need a lot more data. You can get big gains from just doing unsupervised training on 20 million utterance rather than 2 million: http://static.googleusercontent.com/media/research.google.co... There's simply no way you're going to get anything like 20 million utterances without getting data from millions of real world users.

2 comments

This isn't actually true.

The large data size affects the training, but the model itself is pretty small now (after some hard work on Google's part).

The thing everyone seems to be missing is that Android's (English) voice recognizer is offline[1]. While you can use the online model I suspect that is more about continual update of the model (so it understands new words and changing accents etc) rather than recognition.

[1] http://stackoverflow.com/questions/17616994/offline-speech-r...

Android's speech recognizer has a compact/offline mode, but that's definitely not what's run by default.
Good speech recognition is that expensive?

... and people think sentient AI is on the horizon. :P