| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by truthexposer 3386 days ago

I believe what makes the voices robotic is due to the little amount of audio they need to generate a "usuable" voice from the system.

Speech models usually use triphones, which turns out to be a huge amount of audio. This is particularly impressive because of how little data they need.

Google used their own datasets, which are most likely massive.