Hacker News new | ask | show | jobs
by CSSer 1220 days ago
Potentially yes. https://valle-demo.github.io/

> VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.

1 comments

Yeah this is good, but just imagine in 5-10 years how good these models will be. I think indistinguishable from human speech.