Hacker News new | ask | show | jobs
by kastnerkyle 3571 days ago
Relatively, training is fast (due to parallelism / masking so you don't have to sample during training) but during generation sampling is a sequential process. They talk about it a bit in the previous papers for PixelCNN and PixelRNN.