Hacker News new | ask | show | jobs
by alphaBetaGamma 3560 days ago
Slightly off-topic question about WaveNet:

In the paper, they say that they double the dilation factor up to a limit and then repeat: 1, 2, 4, ..., 512, 1, 2, 4, ..., 512, 1, 2, 4, ..., 512

The doubling of the dilation factor makes sense to me, but what is happening with the "repeat" part? I don't understand what they are trying go do. Wouldn't make more sense to continue doubling?

1 comments

My intuition is that the doubling up to 512 does increase the receptive field, but you're essentially building a non-linear convolutional filter with a kernel size of 1024. The network benefits from stacking multiple of these groups, because each group can again convolve over the previous outputs at every temporal distance, which allows for learning deeper/higher level features. It is similar to the stacked 2d convolutions used for images, where every subsequent convolutional layers learns more abstract and higher level features/attributes of the data. This is just intuition though, there is no evidence yet that this holds for wavenet's architecture.