|
|
|
|
|
by erichocean
3571 days ago
|
|
These are the "inputs" I'm talking about recovering (from the link): "In order to use WaveNet to turn text into speech, we have to tell it what the text is. We do this by transforming the text into a sequence of linguistic and phonetic features (which contain information about the current phoneme, syllable, word, etc.) and by feeding it into WaveNet." The raw audio from Step 3 was (in principle) generated by that input on a properly trained WaveNet. We need to recover that so we can transfer it to the target WaveNet. How a specific WaveNet instance is configured (as you point out, it's part of the model parameters) is an implementation detail that is irrelevant for the steps I proposed. |
|