Hacker News new | ask | show | jobs
by bluGill 3266 days ago
Because that would require longer instructions and thus more memory.

Instructions on a CPU are something like to following (this is based on MIPS since x86 is a mess) The first 6 bits are the instruction, the rest is command specific. For add the next 12 would be 4 bits for each of the source registers and then the destination register and then various flags (overflow for example).

If instead they only worked on memory they would have a lot more possible instructions - but there isn't enough room on CPUs to design that many instructions anyway so who cares, followed by the all three memory addresses. This means that every CPU instruction needs to read 3 times as much memory before doing anything. Worse, most of those are pointers: when you compile the code you don't know the location of those address, so in most cases it is read the instruction from the program, then go back to the stack to read the address of the next values, then read those locations. That is a lot of memory access and memory access is expensive. Of course as you can say you can just use caching, but cache is expensive and now you need to add 3 times as much - this is too big for the fast level one cache so now you are expanding level two cache and seeing a lot more cache misses in the level one cache.

The above would all be okay, but it turns out that given enough registers (x86 fails here) in most cases you are operating on the same set of values all the time, (indeed the stack locations each of the above is referring too is probably a small set of variables) so if the compiler is careful it can manage all that. The compiler has better information on when things need to be committed to memory anyway so let it handle that.

1 comments

Not really. The TMS9900 [1] used memory as registers and had a fairly compact instruction set. Yes, it does have registers, but only three (a program counter, a status register, and a pointer to the current "register set" in memory). At the time, it was regarded as a slow machine, probably because of all the memory-to-memory operations.

[1] https://en.wikipedia.org/wiki/Texas_Instruments_TMS9900

Sure, this is one technique. You could also, akin to jump instructions, have a concept of data locality, versus instruction locality. You can do this is a lot of ways without resorting to something like segmentation, which everybody hates. Trivial would be something like a current "data pointer", which would see useful implicit updates, and well as explicit ones (akin to a long jump).