Hacker News new | ask | show | jobs
by astrange 1319 days ago
Memory compression is a generalization of swap, which is only for dynamic memory; files on disk don't need it because you can just read them off the disk.

The problem is that GPUs don't support virtual memory paging, so they can't read files nor decompress nor swap anything unless you write it yourself, which is a lot slower.

Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!

2 comments

Wait. This comment just blew my mind. Does that imply that you might be able to measure the efficiency of a model by it's compressibility? Note, I'm trying to recognize efficient and accurate are not the same. One could imagine evaluating a model on a 2d performance and compression map somehow.
> Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!

I feel like they're kind of two sides of the same coin: learning is about putting more information in the same data, while compression is about putting the same information in less data.

I'm wondering if some lossy floating-point compressor (such as zfp) would work.

> I'm wondering if some lossy floating-point compressor (such as zfp) would work.

Well apparently this can work; StableDiffusion comes as 32-bit and 16-bit float versions. I'm kind of surprised they both work, but that's lossy compression.

Sure, but 16-bit float is pretty primitive compression, as it does not exploit any redundancy in the input. zfp groups numbers together in chunks, which means that correlated numbers can be represented more precisely. Its algorithm is described here: https://zfp.readthedocs.io/en/release1.0.0/algorithm.html#lo...

I would like to see if the zfp can be applied to something like Stable Diffusion (or other ML models) and give better results than regular floats at the same size.