Hacker News new | ask | show | jobs
by dale_glass 1362 days ago
We support that already, yup. But it never hurts to see if there's something better than that out there.
2 comments

You can bootleg your own fast lossless codec by doing delta-encoding on the raw PCM to get a lot of zeros and then feed it through an off-the-shelf fast compressor like snappy/lz4/zstandard/etc. It won't get remotely close to the dedicated audio algorithms, but I wouldn't be surprised if you cut your data size by a factor 2-4 and essentially no CPU cost compared to raw uncompressed audio.
You’ve not done this before have you ?
I haven't, but now I have. I took https://opus-codec.org/static/examples/samples/music_orig.wa... from https://opus-codec.org/examples/. Then I wrote the following snippet of Python code:

    from scipy.io import wavfile
    import numpy as np
    import zstd

    sampling_rate, samples = wavfile.read(r'data/bootleg-compress/music_orig.wav')
    orig = samples.tobytes()

    naive_compressed = zstd.ZSTD_compress(orig)
    deltas = np.diff(samples, prepend=samples.dtype.type(0), axis=0) # Per-channel deltas.
    compressed_deltas = zstd.ZSTD_compress(deltas.ravel()) # Interleave channels and compress.

    decompressed_deltas = np.frombuffer(zstd.ZSTD_uncompress(compressed_deltas), dtype=samples.dtype)
    decompressed = np.cumsum(decompressed_deltas.reshape(deltas.shape), axis=0, dtype=samples.dtype)
    assert np.array_equal(samples, decompressed)

    print(len(orig))
    print(len(naive_compressed))
    print(len(compressed_deltas))
giving:

    17432876
    15518973
    12817602
Looks like my initial estimation of 2-4 was way off (when FLAC achieves ~2 this should've been a red flag), but you do get a ~1.36x reduction in space at basically memory read speed.

Using an encoding for second order differences with storing -127 <= d <= 127 using 1 byte and the others 2 bytes (for an input of 16-bit audio) I got a ratio of ~1.50 for something that can still operate entirely at RAM speed:

    orig = samples.tobytes()
    deltas = np.diff(samples, prepend=samples.dtype.type(0), axis=0)      # Per-channel deltas.
    delta_deltas = np.diff(deltas, prepend=samples.dtype.type(0), axis=0) # Per-channel second-order differences.

    # Many small differences, encode almost all 1-byte differences using 1 byte,
    # using 3 bytes for larger differences. Interleave channels and encode.
    small = np.sum(np.abs(delta_deltas.ravel()) <= 127)
    bootleg = np.zeros(small + (len(delta_deltas.ravel()) - small) * 3, dtype=np.uint8)
    i = 0
    for dda in delta_deltas.flatten():
        if -127 <= dda <= 127:
            bootleg[i] = dda + 127
            i += 1
        else:
            bootleg[i] = 255
            bootleg[i + 1] = (dda + 2**15) % 256
            bootleg[i + 2] = (dda + 2**15) // 256
            i += 3

    compressed_bootleg = zstd.ZSTD_compress(bootleg)
    print(len(compressed_bootleg))

    decompressed_bootleg = zstd.ZSTD_uncompress(compressed_bootleg)
    result = []

    i = 0
    while i < len(bootleg):
        if bootleg[i] < 255:
            result.append(decompressed_bootleg[i] - 127)
            i += 1
        else:
            lo = decompressed_bootleg[i + 1]
            hi = decompressed_bootleg[i + 2]
            result.append(256*hi + lo - 2**15)
            i += 3

    decompressed_delta_deltas = np.array(result, dtype=samples.dtype).reshape(delta_deltas.shape)
    decompressed_deltas = np.cumsum(decompressed_delta_deltas, axis=0, dtype=samples.dtype)
    decompressed = np.cumsum(decompressed_deltas, axis=0, dtype=samples.dtype)
    assert np.array_equal(samples, decompressed)
Prints 11593846.
While I also want a low-computation codec that can save space, the historical use cases unfortunately assumes a lot more CPU power to be compensated for a lot less bandwidth, so there's little research in this area, and there's no real incentive to make something like ProRes and DNxHD as if you are editing audio the SSD speeds has been so fast that you'll run into CPU problems first.