|
|
|
|
|
by computerex
2743 days ago
|
|
Well let's forget the offline android recognizer. That's the one that's built in and doesn't go to Google via the internet to get a better accuracy transcription. It's fairly good for what it is but doesn't come close to the accuracy when you go to the google recognition servers via their API's. That's because the models they offer via the recognition services are much larger, robust and better than what you get straight out of Android. These services offered by companies such as Google do not adapt the acoustic model to individual speakers and are therefore known to be speaker independent. Secondly, when I say "train", it is in a totally different context than how you seem to be using the term. You are using it in the context of adapting an acoustic model to a individual speaker to improve the performance. I am talking about building the initial model. Typical RNN or even convolution based algorithms require a lot of time and processing power to train. What's even harder to get than the processing power though is of course, data to train off of. |
|