|
|
|
|
|
by truthexposer
3341 days ago
|
|
I believe what makes the voices robotic is due to the little amount of audio they need to generate a "usuable" voice from the system. Speech models usually use triphones, which turns out to be a huge amount of audio. This is particularly impressive because of how little data they need. Google used their own datasets, which are most likely massive. |
|