Hacker News new | ask | show | jobs
by dekhn 802 days ago
I've talked to several world class computer scientists/bioinformaticians who realized, on their own, that decompression is the fastest way to fill the cache with data (because the amount of data streaming in is smaller, and CPUs are VERY fast for decompression).

One of the key innovations in the AMBER MD engine that made it work OK on cheaper systems was lossless floating point compression. It still impresses me that you can compress floats, send them over MPI, and decompress them, all faster/lower latency than the transport can send the uncompressed data.

1 comments

Not just MPI over a network. We can compress floats, send them over NVLink or PCIe to another GPU in the same host, and decompress and it can be faster than sending data raw between GPUs, that's the premise behind dietgpu even (it's cheap compression, not a great compression ratio, like 0.6-0.9x of original size, but it's extremely fast, 100s of GB/s throughput, with the idea that you're trying to race something that is similarly as fast. General floating point data could be quite incompressible or highly compressible, it really just depends upon what is being passed around).

The interconnects are improving at a slower rate in general than compute on the CPU/GPU is and it can be exploited.