| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fastoptimizer 3571 days ago
	Do they say how much time is the generation taking? Is this insanely slow to train but extremely fast to do generation?

4 comments

georgehm 3571 days ago

"After training, we can sample the network to generate synthetic utterances. At each step during sampling a value is drawn from the probability distribution computed by the network. This value is then fed back into the input and a new prediction for the next step is made. Building up samples one step at a time like this is computationally expensive, but we have found it essential for generating complex, realistic-sounding audio."

So it looks like generation is a slow process.

link

kastnerkyle 3571 days ago

Relatively, training is fast (due to parallelism / masking so you don't have to sample during training) but during generation sampling is a sequential process. They talk about it a bit in the previous papers for PixelCNN and PixelRNN.

link

microtherion 3570 days ago

According to 3rd hand reports I've heard (apply copious amounts of salt), it may take 1 hour of CPU time to generate 1 second of speech.

link

lucb1e 3571 days ago

I was wondering the same. They don't mention anything about how long it took on what kind of system. Even for a first beta it would give us some ballpark idea of how slow it is -- because it's clearly slow, they just keep back how slow exactly, so it's probably bad.

link