| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by teddyknox 3607 days ago
	Doing a forward pass for every sample sounds like it would be prohibitive for real-time applications.

2 comments

nicklo 3607 days ago

It absolutely is. DeepMind reported that 1 second of audio generation takes about 90 minutes to generate.

link

throwawaymsft 3607 days ago

Assuming it's computation bound, it's a factor of 5400 (~13 doublings in CPU power required to get to real-time, assuming no algorithmic improvements).

link

mdsteph 3606 days ago

If I'm not mistaken, it seems that the current limitation is that it needs to be produced sequentially for a dependent sequence of audio, perhaps some independent sentences can be run simultaneously using copies of the net assuming no memory limitations. I wonder if it's already possible to create an auidobook for instance in reasonable time.

link

mattnewton 3607 days ago

Do they mention it was CPU trained? I assumed GPU. If it was CPU trained, I wonder what the operations keeping it off the GPU were?

link

Houshalter 3606 days ago

Google has special neural net ASICs now.

link

ogrisel 3606 days ago

Google never stated they use those to train models as far as I know. It seems that they are primarily used to spare energy when deploying trained models at scale.

link

Houshalter 3606 days ago

Theres no reason they couldn't use them to train, as long as they can account for the lower precision operations. I think it would be much cheaper to train on them, at that scale anyway.

link

jcannell 3606 days ago

That doesn't imply they can run WaveNet yet - for inference this net is sort of worst-case serial. Their TPU ASIC is almost certainly highly parallel, like a GPU - actually has to be that way for energy efficiency (which is it's claimed benefit).

Wavenet actually looks like it could possibly have been designed to run on CPUs in production, at least after they can further optimize it some. Sampling is super slow right now because it requires an enormous number of tiny dependent TF ops and thus kernels that have huge overhead for tiny amounts of work. A custom implementation could probably circumvent that by evaluating all the layers sequentially in local cache on a fast CPU.

Or they just designed it without much concern for production plausibility yet.

link

Houshalter 3606 days ago

I'm not sure how this algorithm is serial. The neural net layers still involve huge convolutions that can all be done in parallel.

link

jbpetersen 3607 days ago

Building an ASIC for it would be another option to speed things up on the computation side.

link

Itsdijital 3607 days ago

Was that in the paper? I was looking for a source for it last night but couldn't come up with it

link

nshm 3607 days ago

Why would a honest researcher mention downsides of his work in a paper. No, it was on twitter https://www.reddit.com/r/MachineLearning/comments/51sr9t/dee...

link

confluence 3607 days ago

https://news.ycombinator.com/item?id=12463263

Looks like the source deleted their tweet.

link

lallysingh 3607 days ago

Can we just use 90 cores?

link

rryan 3606 days ago

Unfortunately no, see Amdahl's Law.

https://en.wikipedia.org/wiki/Amdahl%27s_law

link

arcanus 3607 days ago

If we did, it is not likely that the strong scaling is perfect.

link

c3534l 3607 days ago

We're still a couple of papers before we the computation down to a reasonable amount. Or eventually Moore's Law will take care of it. It might have applications that aren't real-time, too. I'm writing a video game in my spare time, and I was wondering how I would do the foley. If I could feed in some sounds and synthesize a library of sound effects that all sound different enough that it won't be repetitive to hear that same exact footstep sound for the entire game, then I consider that a win. So, you know, this is cutting edge research we're talking about.

link