Hacker News new | ask | show | jobs
by the8472 2333 days ago
Don't fancy x86 addressing modes provide most of those multiplications and offsets with very little IPC penalty?
2 comments

Yeah, this should be roughly the same overhead as an ADD:

    LEA rDest, [rBase + 8*rPtr]
(The "load effective address" instruction computes an effective address like a load or store would, but just gives the address without doing a memory access.)
AIUI mov supports these things directly[0] and if I read the instruction tables correctly then at least on skylake the latency/throughput is the same for all addressing modes[1]

[0] http://www.c-jump.com/CIS77/ASM/Addressing/lecture.html#R77_... [1] https://www.agner.org/optimize/instruction_tables.pdf (page 238)

Decompression isn't the problem, compression is. Compression is just a mov. Now we need additional shifts.
Also we'll probably lose some cache benefits from compression due to larger alignment.