Hacker News new | ask | show | jobs
by svantana 3564 days ago
The most interesting thing here is the note at the bottom regarding computational cost: "A recent macbook pro reaches about 5 samples per second."

This shows how far this model is from realtime usage. However I'm sure Deepmind researchers are already looking into how to make this blockbased or some other optimization strategy.

1 comments

And I should add that this was measured using a downsized model (just two blocks of dilated convolutions and a sampling rate of 4khz). Deepmind's paper does not report how many stacks are used to generate the samples, but I assume it's quite a bit more.
They (deepmind) reported it took 90 minutes of processing to generate 1s of speech via tweet. Hopefully this comes down in the future.
Do you have a link?

This implementation says: “A Tesla K80 needs around ~4 minutes for generating a second of audio at a sampling rate of 4000hz”, which is significantly faster.

90 minutes for 1s of audio was reported by someone from Google on twitter, but the tweet has been deleted. I've clarified in the readme that my measurements are for a much lighter/smaller model than Deepmind's :).