| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by InspiredIdiot 1198 days ago

I think parent is saying that the model does not require any paired samples of the voice to be synthesized and corresponding text. So based on my understanding:

one shot - given the text "run faster" along with Alan Greenspan's voice pronouncing that phrase, the model can produce Alan Greenspan's voice saying any other phrase

zero shot - given only Alan Greenspan's voice pronouncing "run faster" but no text version of what was said, the model can produce Alan Greenspan's voice saying any other phrase

1 comments

CyberDildonics 1198 days ago

Does that mean a shot is text?

link