Hacker News new | ask | show | jobs
by jjoonathan 1745 days ago
What does the state of the art look like? I'm not following closely enough to know, I only get an update when something leaks into the meme-o-sphere, like the recent "World Leaders singing Numa Numa:" https://www.youtube.com/watch?v=Y8CKlTlQA64
1 comments

Here's my stupid text to speech website:

https://vo.codes

And you can try the alpha of "version 2" by visiting this, which has improvements in synthesis and vocoder quality, and also lets people contribute:

https://api.vo.codes/enable

This is far from state of the art, but you can see at the other end of the spectrum that it's an easy toy that anybody can use.

I'm working on a photo -> 3D rigged model system and model style transfer. You can follow me on Twitch or Twitter to see that when it's ready.

Very cool! This is fun and I can see big improvements from v1 to v2. I look forward to watching this evolve!

https://vo.codes/tts/result/TR:s1rj02g34ppc7bhq1m8bf6p3thny6

Thanks!!

A few areas for improvement in the clip you posted:

I need to add better duration estimation. It's unfortunately truncated.

A lot of the community-trained voices don't fully leverage phonetic annotation, so some of the words fall flat.

I think the synthesizer has too much noise in it (you can see this in the image). The person who trained it probably used noisy data.

Finally, the universal vocoder isn't handling James Earl Jones' deep voice very well. It should be fine tuned.