Hacker News new | ask | show | jobs
by 100ideas 3404 days ago
> "We conclude that the main barrier to progress towards natural TTS lies with duration and fundamental frequency prediction, and our systems have not meaningfully progressed past the state of the art in that regard."

Who is working on this problem, and how?

1 comments

We're working on this. Here is a very early demo of Julian. Don't be surprised, he sounds like a teenager with a high-pitched voice, recorded in his bedroom, because that's how the sample library was recorded. https://soundcloud.com/komponant/julian-speech-demo NB the expressions (durations, F0) are manually adjusted, not predicted by a NN. We've built a fully flexible text-to-voice engine, not the brain that goes with it. But we're looking for people with experience in ML to work on this, so feel free to contact us.