Hacker News new | ask | show | jobs
by dharma1 3571 days ago
The samples sound amazing. These causal convolutions look like a great idea, will have to re-read a few times. All the previous generative audio from raw audio samples I've heard (using LSTM) has been super noisy. These are crystal clear.

Dilated convolutions are already implemented in TF, look forward to someone implementing this paper and publishing the code.

1 comments

I did a review for PixelCNN as a part of my summer internship, it covers a bit about how careful masking can be used to create a chain of conditional probabilities [0], which AFAIK is exactly how this "causal convolution" works (can't have dependencies in the 'future'). The PixelCNN and PixelRNN papers also cover this in a fair bit of detail. Ishaan Gulrajani's code is also a great implementation reference for PixelCNN / masking [1].

[0] https://github.com/tensorflow/magenta/blob/master/magenta/re...

[1] https://github.com/igul222/pixel_rnn/blob/master/pixel_rnn.p...

Heh, just read it! Very useful, will have to go through in detail