Hacker News new | ask | show | jobs
by svantana 3284 days ago
I'm sorry but is the Deep Learning Hype strong enough to warp people's sensory perception? Every sample on this page sounds terrible IMHO, and pretty much what you would get if you would spend 10 minutes implementing the most naive spectrogram resynthesis you could think of. Granted, there is great promise in finding the "manifold of music", which seems to be the goal here, but what they show is just not anywhere near that promise.
4 comments

Agreed. The texture is nice - I enjoy a low-fi sound - but the fun of sound engineering is building your own signal paths to modulate or destroy sound interactively. The more abstracted the sound generation method, the more of a toy and the less of a tool it is, because the rising non-linearities make it increasingly difficult to pursue a specific objective. This has alway sbeen a limiting factor for FM, where undirected noodling can certainly yield interesting results, but not very controllable ones beyond3 or 4 operators.

I do think it's interesting and valuable work. But it's worth bearing in mind that there's no shortage of great resynthesis tools already, and that musicians are besieged with offers from technologists for Sounds! That! Have! Never! Been! Possible! Before! While you can always rely on Jordan Rudess to provide a celebrity endorsement to the keyboard collector crowd, most hobbyist musicians eventually get over chasing novelty and end up reducing their equipment load to a smaller number of really well-engineered devices or software tools that they really like and get to know inside out.

The 'cello' and 'laaa...' actually made me quickly remove my headphones. Having 'character' is not even close to how I would describe these.
They're using very low quality sample rates, 8 bit, not pretty. Until it can do 32 hit samples it's going to sound horrible.
I've read the articles about NSynth with interest, but I can't figure out why they're using 8-bit and low sample rates. Surely, it's not that much more computationally intensive that they can't tinker at 8 bits and then do a render at a high resolution once they've settled on some parameters they like.
Possibly the same reason all the Style Transfer implementations use very low resolution images? All the neural net applications I've seen seem to have problems with high resolutions in any form.
The 8-bit is actually reasonable: they have one output per possible value, so 16 bit would mean 65k outputs... They could probably do a secondary step that adds less significant bits. The low samplerate is probably because it's originally used for speech, and a lot of speech databases are in 16 kHz.
It's probably a similar reason why 8 bit homebrew computers are more popular than 16: the complexity isn't linear.
Yeah, granted there are neural resynthesis packages which do function, they are just waaay too slow for realtime audio production at the moment (and probably will be for a long time, now that moore's law is dead).