|
|
|
|
|
by erichocean
3571 days ago
|
|
> It seems like you're using WaveNet to do speech-to-text I'm proposing reducing a vocal performance into the corresponding WaveNet input. At no point in that process is the actual "text" recovered, and doing so would defeat the whole purpose, since I don't care about the text, I care about the performance of speaking the text (whatever it was). In your example, I can't force Trump to say something in particular. But I can force myself, so I could record myself saying something I wanted Clinton to say [Step 3] (and in a particular way, too!), and if I had a trained WaveNet for myself and Clinton, I could make it seem like Clinton actually said it. |
|
Text -> features -> TrumpWaveNet -> Trump saying your text