| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by svantana 3284 days ago
	I'm sorry but is the Deep Learning Hype strong enough to warp people's sensory perception? Every sample on this page sounds terrible IMHO, and pretty much what you would get if you would spend 10 minutes implementing the most naive spectrogram resynthesis you could think of. Granted, there is great promise in finding the "manifold of music", which seems to be the goal here, but what they show is just not anywhere near that promise.

4 comments

anigbrowl 3284 days ago

Agreed. The texture is nice - I enjoy a low-fi sound - but the fun of sound engineering is building your own signal paths to modulate or destroy sound interactively. The more abstracted the sound generation method, the more of a toy and the less of a tool it is, because the rising non-linearities make it increasingly difficult to pursue a specific objective. This has alway sbeen a limiting factor for FM, where undirected noodling can certainly yield interesting results, but not very controllable ones beyond3 or 4 operators.

I do think it's interesting and valuable work. But it's worth bearing in mind that there's no shortage of great resynthesis tools already, and that musicians are besieged with offers from technologists for Sounds! That! Have! Never! Been! Possible! Before! While you can always rely on Jordan Rudess to provide a celebrity endorsement to the keyboard collector crowd, most hobbyist musicians eventually get over chasing novelty and end up reducing their equipment load to a smaller number of really well-engineered devices or software tools that they really like and get to know inside out.

link

mbell 3284 days ago

The 'cello' and 'laaa...' actually made me quickly remove my headphones. Having 'character' is not even close to how I would describe these.

link

shams93 3284 days ago

They're using very low quality sample rates, 8 bit, not pretty. Until it can do 32 hit samples it's going to sound horrible.

link

SwellJoe 3284 days ago

I've read the articles about NSynth with interest, but I can't figure out why they're using 8-bit and low sample rates. Surely, it's not that much more computationally intensive that they can't tinker at 8 bits and then do a render at a high resolution once they've settled on some parameters they like.

link

doomlaser 3284 days ago

Possibly the same reason all the Style Transfer implementations use very low resolution images? All the neural net applications I've seen seem to have problems with high resolutions in any form.

link

svantana 3284 days ago

The 8-bit is actually reasonable: they have one output per possible value, so 16 bit would mean 65k outputs... They could probably do a secondary step that adds less significant bits. The low samplerate is probably because it's originally used for speech, and a lot of speech databases are in 16 kHz.

link

mysterydip 3284 days ago

It's probably a similar reason why 8 bit homebrew computers are more popular than 16: the complexity isn't linear.

link

microcolonel 3284 days ago

Yeah, granted there are neural resynthesis packages which do function, they are just waaay too slow for realtime audio production at the moment (and probably will be for a long time, now that moore's law is dead).

link