| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CSSer 1220 days ago
	Potentially yes. https://valle-demo.github.io/ > VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.

1 comments

Yeah this is good, but just imagine in 5-10 years how good these models will be. I think indistinguishable from human speech.