| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dheera 3746 days ago

> It is amazingly easy to create speech recognition without going out to any API these days.

Not really. The hard part is not the algorithm, it is the millions of samples of training data that have gone behind Google's system. They pretty much have every accent and way of speaking covered in their system which is what allows them to deliver such a high-accuracy speaker-independent system.

CMUSphinx is remarkable as an academic milestone, but in all honesty it's basically unusuable from a product standpoint. If your speech recognition is only 95% accurate, you're going to have a lot of very unhappy users. Average Joes are used to things like microwave ovens, which work 99.99% of the time, and expect new technology to "just work".

CMUSphinx is also an old algorithm; AFAIK Google is neural-network based.

1 comments

dharma1 3746 days ago

Eesen looks promising, uses LSTM and CTC rather than older tech.

https://github.com/yajiemiao/eesen

Baidu open sourced their CTC implementation

https://github.com/baidu-research/warp-ctc

I think we will have an easy to install OSS speech recognition library and accurate pretrained networks not far off from Google/Alexa/Baidu, running locally rather than in the cloud, within 1-2 years. Can't wait.

link