Hacker News new | ask | show | jobs
by basve 3565 days ago
And I should add that this was measured using a downsized model (just two blocks of dilated convolutions and a sampling rate of 4khz). Deepmind's paper does not report how many stacks are used to generate the samples, but I assume it's quite a bit more.
1 comments

They (deepmind) reported it took 90 minutes of processing to generate 1s of speech via tweet. Hopefully this comes down in the future.
Do you have a link?

This implementation says: “A Tesla K80 needs around ~4 minutes for generating a second of audio at a sampling rate of 4000hz”, which is significantly faster.

90 minutes for 1s of audio was reported by someone from Google on twitter, but the tweet has been deleted. I've clarified in the readme that my measurements are for a much lighter/smaller model than Deepmind's :).