Hacker News new | ask | show | jobs
by highd 3287 days ago
My thinking is this is a good GAN problem. L2 norm will have these bad trivial upscalings as local minima - since L2 in time domain is the same as L2 in frequency domain, you can think in the frequency domain that it basically has this big black area to infill from very little information. If you had some sort of perceptual similarity, on the other hand, there will be lots of adjacent improvements in quality that will reduce the error and make it easier to train. I think this matches the results seen in image upscaling, too.
1 comments

In fact, when you listen to the downsampled example, there is actually a lot of information in the extract. Way more than enough. That's because the frequency should be in log scale to be more relevant to the human hear.

Here the frequency cutoff is 2 Khz, which is already a fairly high pitch.