Hacker News new | ask | show | jobs
by ezy 5103 days ago
This approach still uses HMMs, it's just that the observation probabilities are now coming from a DNN (neural network) instead of a GMM (gaussian mixture model). "Senones" are not new, HTK can use various context dependent phoneme models, and the HMM states (typically 3) within each context dependent phoneme essentially boil down to what they call a "senone" here. Interestingly, they use GMM's to bootstrap the DNN training -- which I suppose you could avoid once you have a reasonable DNN laying around.

The main difference here is hooking DNN output to an HMM decoder, replacing GMMs, and possibly even more important the training process they use to get the DNN fairly efficiently. That's the biggest thing -- GMMs, at least the last time I've looked, can be trained and adapted much quicker than a DNN.