Hacker News new | ask | show | jobs
by dhammack 3572 days ago
It seems like you're using WaveNet to do speech-to-text when we have better tools for that. To transfer text from Trump to Clinton, first run speech-to-text on Trump speech and then give that to a WaveNet trained on Clinton to generate speech that sounds like her but says the same thing as Trump.
1 comments

> It seems like you're using WaveNet to do speech-to-text

I'm proposing reducing a vocal performance into the corresponding WaveNet input. At no point in that process is the actual "text" recovered, and doing so would defeat the whole purpose, since I don't care about the text, I care about the performance of speaking the text (whatever it was).

In your example, I can't force Trump to say something in particular. But I can force myself, so I could record myself saying something I wanted Clinton to say [Step 3] (and in a particular way, too!), and if I had a trained WaveNet for myself and Clinton, I could make it seem like Clinton actually said it.

I see. I still think it's easier to apply deepmind's feature transform on text rather than to try to invert a neural network. Armed with a network trained on Trump, deepmind's feature transform from text->network inputs, you should be able to make him say whatever you want, right?

Text -> features -> TrumpWaveNet -> Trump saying your text

> Armed with a network trained on Trump, deepmind's feature transform from text->network inputs, you should be able to make him say whatever you want, right?

Yes, that should work, and by tweaking the WaveNet input appropriately, you could also get him to say it in a particular way.