Hacker News new | ask | show | jobs
by _nalply 685 days ago
Would it make sense to compress code and data with something like zstd and let the CPU decompress?

Of course this means a large change of how computers work but perhaps it is possible to make this opt-in (i.e. backwards compatible) for software?

4 comments

This would make memory read performance much, much more unpredictable, so it is a no-go from the start. And beyond that, the problem is not one of bandwidth, it is one of latency. This would increase bandwidth sometimes, but it would increase latency always, which is a terrible trade-off. Missed branch predictions would cost even more than they do today, for example.
This idea isn't about compressing in-flight, but in the instruction cache, so that more instructions will fit into the cache, and you don't need to fetch as often (and thus incur latency) from main memory / L2.

Zstd is impractical, but I can imagine some sort of storage efficient microcode? (current Intel CPUs store original x86 instructions in the L1 instruction cache).

You then need extra memory to store the de-compressed instructions, since you can't predict before running the decompression how many instructions you'll get. And you still the problem of an unpredictably-sized instruction cache.

Plus, the larger the instruction cache is, the worse every branch mis-prediction is. As far as I know, the size of the instruction cache is not really limited because of space issues, it's limited for precisely this reason.

Some instruction sets are variable length, there are also things like thumb and the 'compressed' riscv extension.

If you compress each instruction one at a time into a variable number of bits the ability to jump to any instruction is preserved but compression is hurt by not being able to use cross-instruction conditional probabilities and having to pack things into integer numbers of bits.

One could imagine a compression format that had 'jump points' -- compiler selection locations where it was possible to jump and decode from, so that you'd only take the above losses at potential jump targets.

You could go further an imagine the instruction set having some kind of constrained short jump forward a limited distance (by simply letting the decoder decode that way) or that can go back a limited distance without using a jump point so long as there was no way for controlflow to change out from under it.

I wonder what percentage of instructions would need to be jump points in such a system?

But I think this is mostly of academic interest: except in very small embedded devices code size is not a huge performance driver and like anything else you get diminishing returns from further optimizations. Variable length multiple-of-bytes instruction sets like x86 probably get a significant fraction of the potential compression gains and do so without a lot of complexity.

x86 is a variable-length encoding where more frequently used instructions tend to have shorter encodings. Thumb doesn't go as far, with only 2 and 4 -byte instructions. You can find old papers on Huffman encoding instructions.

More effective block compression schemes are harder to pull off because of branches.

The problem here is branches. since you can jump to pretty much any instruction, every instruction needs to be a valid place to start decoding which makes the idea non-tenable.