Hacker News new | ask | show | jobs
by joe_bleau 4139 days ago
A quick scan didn't reveal to me whether he's time aligning the signals during the subtraction. I've played with listening to the wav-mp3 signal before, and I seem to recall that the mp3 encoder would introduce a little delay.

I added a transient pulse in front of the music so that I could (visually) time align the signals before subtracting them.

5 comments

I wondered what my favourite test tracks would sound like, so I made a (somewhat stupid, and a bit slow...) program to produce the difference between an MP3 and a WAV: https://github.com/tom-seddon/bin/blob/master/find_mp3_resid...

(Dependencies: python 2.7, lame, GNU make, mpg123, and (if you use FLAC files as input) flac. Tested on my Linux PC with LAME 64bits version 3.99.5 and mpg123 1.14.4 from the debian stable package repository. Run with -h to get some "help".)

It uses lame to compress and mpg123 to decompress, and I don't know if there's something special going on but the output WAVs always seem to have the same number of samples as the original. And they seem to be aligned - if you use this program you'll find that the difference between WAV and 128kbps MP3 is somewhat noisy, but WAV vs 320kbps MP3 is pretty much silent.

(Or maybe you'll find something totally different! Who knows. I only tested this on my system.)

Neat, thanks for running the experiment to see how differing MP3 encoder settings affect the lost portion. This explains why 320kbps is generally accepted amongst DJs, as any loss is significantly less than that caused by the club sound system :P
I did a blind test when I was in my 20s, and while on a couple of tracks I could actually tell the difference between 320kbps and the original, I did have to concentrate. And I couldn't really have said that one was necessarily better than the other; the effect was as if one type of noise-y sound was being replaced with a subtly different type of sound with the same noise-y quality. Different, but overall the same.

Listening to the diff of one of those tracks today was interesting! All I can hear is the drums... and where the sound I'm thinking of plays, it sounds like rather quiet interference! But the drums as I recall sounded absolutely identical. Interesting that the ears can detect one thing but not the other.

(I didn't bother to re-run the full comparison, as I'm no longer in my 20s. One good (?) thing about getting old is that your hearing deteriorates, and issues such as this become moot. You can also afford the disk space to just compress everything at 320kbps. Then you don't have to worry about it, and it fits OK on your phone too.)

320kbps with highest quality setting is pretty much an industry standard now, and many DJs, myself included, make use of that.

As you've looked into this before, do you know what the similar difference is like for such professional-grade encoding?

(Note: no idea what mp3 encoder Audacity uses, and I'm sure the results will vary with encoder settings as well.)

I just fired up Audacity and generated a click track, with the first click at 1 second in. The exported 44.1k wav file, when loaded in Audacity, shows the click at exactly 44100 samples in.

The exported mp3 file, when loaded, shows the click to be around 46357 samples in. (It's a bit hard to measure, because the encoded has smeared the pulse.) Somewhere between 51-52 ms late relative to the wav file.

Listening to the wav and mp3 ticks summed, the delay is obvious--they are not in sync at all. Adding 2257 samples of silence to the front end of the wav file puts them back in audible sync.

He is probably dealing with this, given that the audio piece is not just "tomsdiner.wav - timsdiner.mp3". There's a lot of processing happening after that:

----

Verse one finds the narrator in a bustling diner, making observations about her environment. The focus of this text is external to it's author, as opposed to later verses which exist in a more subjective, internal space. Using different settings to harvest the lost material, I was able to isolate both clear, pitched content and more ephemeral transient signals.

Using the python library headspace, and a reverb model of a small diner, I began to construct a virtual 3-d space. Beginning by fragmenting and scrambling the more transient material, I applied head related transfer functions to simulate the background conversation one might hear in a diner. Tracking the amplitude of the original melody in the verse, I applied a loose amplitude envelope to these signals. Thus, a remnant of the original vocal line comes through in its amplitude contour.

Having constructed this background, prominent pitches from the original melody appear and disappear, located variously in this virtual space. These ephemeral sounds hint at a familiar melody, playing with aural memory and imagination, a flickering apparition hovering at the border of consciousness.

----

- found near the bottom of http://theghostinthemp3.com/theghostinthemp3.html

That seems to me mean that the author composed new audio, and isn't presenting "wav minus mp3"
That would explain the phasing/flanging like sound which gives the ghost recording such an eerie feel
You could solve this automatically with time shifted convolution/correlation with the original signal.