| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mrob 2236 days ago

This isn't bad, but the note decays sound noticeably different. My guess is that the NN doesn't know that human ears have non-linear response that makes them more sensitive to errors in the decay than the attack, so it treats them equivalently. If this is the case then it might be fixable by using logarithmic scale audio samples instead of linear.

The non-linearity of the ear is frequency dependent[0], but in practice I suspect it would be sufficient to pre-process the linear PCM data with x=sqrt(x) and undo before playback with x=x^2.

[0] https://en.wikipedia.org/wiki/Equal-loudness_contour

2 comments

thesausageking 2236 days ago

I came into the comments to say the same thing. To my ears, the NN versions roll off unnaturally at the end and that makes them really easy to identify as artificial.

link

rubatuga 2236 days ago

Why square root and not log?

link

mrob 2236 days ago

Cheap and dirty fast calculation. I don't actually know what the best mapping is, so I'd start with this.

link