Hacker News new | ask | show | jobs
by orlp 1323 days ago
Our world is near-limitless in bandwidth, but highly restrained in cache size and latency. 10x compression means you can keep 10x more stuff in cache.

And it doesn't matter what level you operate on, cache and latency is always relevant. Whether it's registers, L1, L2, L3, same-core NUMA RAM, cross-core, SSD, disk controller cache, disk, same-location distribution server, cross-location distribution server, tape archive backup, etc, going up a level of cache is always a lot slower regardless of bandwidth if you're doing a small read.

1 comments

Keep in mind that you also need to put the model weights somewhere...