It kind of reads like you have discredited frequency domain compression algorithms. FYI JPEG used a block-based DCT which is very similar to a Fourier transform.
That was not my intent. I just noted that my approach came from visually inspecting spectrograms and noting that they might be compressed quite well by image compression algorithms. Only to notice in the end that lossy image compression completely butchers the signal with noise if doing that. Had I just used the original waveform as raw image data that wouldn't have been as much of a problem. Compression would still have changed the samples, but overall probably not as badly.
Heck, I've been 17 when I did that. I certainly didn't really know much about the theory behind or why it almost certainly would fail. But I guess by now I wouldn't even attempt such craziness, so in a way it's probably a good thing to not know things too well, sometimes.
The problem is that we are much more sensitive to audio distortion than visual, particularly dynamic (e.g.video). Pretty much all lossy audio compression does a bad job without a psycho-acoustic model, whereas image compression works pretty well on simple metrics. It doesn't help that psycho-visual modelling is less well understood (than audio).
Heck, I've been 17 when I did that. I certainly didn't really know much about the theory behind or why it almost certainly would fail. But I guess by now I wouldn't even attempt such craziness, so in a way it's probably a good thing to not know things too well, sometimes.