| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by svantana 3564 days ago
	The most interesting thing here is the note at the bottom regarding computational cost: "A recent macbook pro reaches about 5 samples per second." This shows how far this model is from realtime usage. However I'm sure Deepmind researchers are already looking into how to make this blockbased or some other optimization strategy.

1 comments

basve 3564 days ago

And I should add that this was measured using a downsized model (just two blocks of dilated convolutions and a sampling rate of 4khz). Deepmind's paper does not report how many stacks are used to generate the samples, but I assume it's quite a bit more.

link

unlikelymordant 3564 days ago

They (deepmind) reported it took 90 minutes of processing to generate 1s of speech via tweet. Hopefully this comes down in the future.

link

espadrine 3564 days ago

Do you have a link?

This implementation says: “A Tesla K80 needs around ~4 minutes for generating a second of audio at a sampling rate of 4000hz”, which is significantly faster.

link

basve 3564 days ago

90 minutes for 1s of audio was reported by someone from Google on twitter, but the tweet has been deleted. I've clarified in the readme that my measurements are for a much lighter/smaller model than Deepmind's :).

link