Hacker News new | ask | show | jobs
by cjonas 495 days ago
What scenario would be considered "zero shot" voice cloning?

At the very least, wouldn't you have to provide 1 sample? Which would make it "few shot" (if that term really even makes sense in this context).

1 comments

I think the key distinction is that there is no specific training data for that speaker. You can view the input as just the input voice to clone, not training examples.

It would be more like training examples if you had to give it specific phrases.