Hacker News new | ask | show | jobs
by minimaxir 2245 days ago
From the GitHub repo:

"On a V100, it takes about 3 hrs to fully sample 20 seconds of music."

That might make building off this project out of reach of the average engineer (you certainly cannot build that into a Colab notebook), although that necessary amount of compute is not surprising.

3 comments

Eh. It's built on Transformers, and people have already demonstrated considerable model distillation/compression on those just like every other kind of NN, and as they note, once you've trained a teacher model, you can probably train a wide flat model for similar results. (As I recall, WaveNet used to be similarly slow, but even without the parallel WaveNet retraining, with proper caching of repeated states, you could make it orders of magnitude faster and approach realtime.)
They added a link to a Colab notebook. The upsampling takes most of that time, so if you're wiling to deal with a noisy and compressed sounding piece, it's actually very doable.
Isn’t that superhuman?

I would guess that on average, it takes a professional more than 36 hours ((4×60÷20)×3) to make a 4-minute audio track with original music based on given lyrics.

I don’t really see the point of this comparison. Composing, arranging, and producing a song is not a benchmark you can profile against; musicians are not performing some kind of music compute that produces a set number of music units per hour.

Speaking from my own experience, I’ve had tracks that took months to complete, and I’ve had tracks that I got to probably 90% completion in under an hour. I would propose that there’s no meaningful definition of “superhuman” for creative efforts.

Agreed. Although "professional" pop production does tend to be somewhat involved, it doesn't have to be, and total time spent could vary so radically as to have essentially no correlation to anything else.
The professional's output would be a lot more listenable, though, most likely!
Definitely!

It’s impressive that now, they “only” need to improve the quality for it to outcompete professional musicians on commercial delivery.