Hacker News new | ask | show | jobs
by devcat 1320 days ago
My favorite song for testing these audio compression algorithms is Adele’s hello because any changes to her voice instantly pop out.

It did a great job of reducing FLAC size from 100mb to less than 1mb using stereo 24kbps preset but audio quality suffers a lot in some places. Maybe training it with 48kbps or 64kbps would make it a feasible alternative for storing music without much quality loss.

In comparison, lame insane preset (320kbps) produces 12mb mp3 with almost indistinguishable quality from flac.

For those who want to listen to sample: https://a.pomf.cat/pcjynr.wav (first flac then encodec)

3 comments

I wonder how well this would work for the basis of a FLAC-like lossless encoder. FLAC works by approximating the audio stream with a lossy linear predictive code, and then storing the LPC encoding and its residuals (i.e. the delta between the original signal and its lossy approximation). It turns out that LPC+residuals are a lot more amenable to lossless compression (via Huffman coding) than the raw audio signal itself. If the LPC were replaced with this neural network based encoding, would the resulting encoding+residuals also be amenable to lossless compression?
I think the main difficulty is that a neural decoder is allowed to make up lots of plausible phase information, which likely leads to pretty large L2 errors while retaining perceptual quality. So then you'll end up with large residuals even though you might only barely discern the difference perceptually.
Interesting that “Hello” could be this generations Tom’s Diner:

https://www.mentalfloss.com/article/19727/how-toms-diner-tun...

Really interesting results, thanks for sharing!

The sound of the AI encoder is distinct and probably not suited for music at that bitrate, but would probably serve fine for Facebook videos and podcasts. I'd be really interested in seeing a model optimized to compress human speech...