Hacker News new | ask | show | jobs
by rot13xor 1238 days ago
I always wondered why x86 has LEA when its functionality can be replicated by ADD. It has to do with LEA and ADD being able to run in parallel because LEA uses a separate ALU in the address calculation part of the chip, not the main ALU.
2 comments

LEA only got more powerful in later models as the restraints on registers were removed and more addressing modes got added. Now it can do several additions and a multiplication in a single operation. Reusing that memory hardware/instruction format is a clever ISA decision.
Wait, so if you were imaginative enough with how you use registers to calculate addresses you could abuse LEA as a DSP MAC (multiply/accumulate) instruction?
Compilers do it all the time, for example GCC compiles "x*4 + y" to a single LEA instruction: https://godbolt.org/z/TvdW5sK4b
Aha, playing with it, it looks like the multiply is just a shift because if you try to do anything other than powers of two it has to break it down into more instructions.

Still clever, though. I guess it's to make it easier and quicker to index over words or multiples of words?

The addressing logic on the 80386+ can add an index shifted left by 0..3 bits (unscaled, x2, x4, x8) with a base register plus an immediate offset.

By using the same register for base and index you can also multiply one register by 3, 5 or 9.

Earlier (16 bit) x86 chips did not have the scaling feature and were limited to certain combinations of base and index (BX/BP as base, SI/DI as index), so LEA was less useful. If the registers are carefully assigned, it could still be used to do an addition and put the result into another register. Normal ALU operations always use one of the operands as their destination.

The system might choose to use relocations for LEA and not for ADD -- this is of course not relevant for stack relative addressing and struct member addressing on the heap. I think I ran into this when coding assembler for DOS in the late 80's.

LEA also gets the "register + offset" thing done in a single instruction instead of two (MOV + ADD). It's also really easy for both assembler programmers and (dumb) compilers.

The "run in parallel" stuff is you looking at modern(ish) CPUs and thinking the original 8086 looked anything like that inside.