Hacker News new | ask | show | jobs
by mrob 2236 days ago
This isn't bad, but the note decays sound noticeably different. My guess is that the NN doesn't know that human ears have non-linear response that makes them more sensitive to errors in the decay than the attack, so it treats them equivalently. If this is the case then it might be fixable by using logarithmic scale audio samples instead of linear.

The non-linearity of the ear is frequency dependent[0], but in practice I suspect it would be sufficient to pre-process the linear PCM data with x=sqrt(x) and undo before playback with x=x^2.

[0] https://en.wikipedia.org/wiki/Equal-loudness_contour

2 comments

I came into the comments to say the same thing. To my ears, the NN versions roll off unnaturally at the end and that makes them really easy to identify as artificial.
Why square root and not log?
Cheap and dirty fast calculation. I don't actually know what the best mapping is, so I'd start with this.