|
|
|
|
|
by CSSer
1220 days ago
|
|
Potentially yes. https://valle-demo.github.io/ > VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. |
|