Hacker News new | ask | show | jobs
by pslam 4024 days ago
That's a fun program. Similar techniques to this are used to create "ROP" (return-oriented programming) exploits. You get enough building blocks to get general purpose programming, and it's "just" matter of mapping to those building blocks.

X86 "mov" is a bit of a cheat, though. It's a single mnemonic for what's really dozens of quite different instructions. For example, ARM has "mov" for register-register operations, but "ldr/str" for register-memory, while X86 just uses mov for everything. Even ARM mov optionally puts a barrel shifter into the path, so it's not that pure either.

3 comments

Even if they weren't a single mnemonic, if you read the paper you'll find that reg = mem[reg + C], mem[reg + C] = reg, and reg = C appear to be the only 3 instruction types that are actually needed; these are only register-memory and register-constant data transfers. Even register-register isn't needed.
register-register movs are needed because compilers are not the world's smartest code generators, and we don't have an infinite number of registers. The biggest problem arises when some calling convention expects certain things to be in certain registers - unless you do some pretty extreme long term planning in your register allocation code in the compiler, you're pretty likely to eventually wind up with some piece of information in a register that you need to move to another register before making or returning from a call.

Of course, modern processors don't really care about register-register moves because they have hundreds of hidden intermediate registers and register renaming which makes those types of moves "free." (e.g. Haswell has 168 GP registers, only 16 of which are exposed through the AMD64 ISA.)

Note that you don't need a dumb compiler, register pressure, or an instruction with physical register constraints to require a copy. Two-address instructions suffice: simply have a register for the first operand that contains a value that is used later. A copy is required to preserve the value stomped by the mutating instruction.

In any case I believe the point of the GP is that any necessary copy can be done by going through memory, so register-register moves can be dispensed with where minimalism is the goal.

Do you have a citation for the 168GP registers, just wondering if Intel have "offically" stated the number they use in a given microarch.
You can find an official Intel source for Haswell in here (page 15): http://pages.cs.wisc.edu/~rajwar/papers/ieee_micro_haswell.p...
I might be misreading this, but I think it's on slide 24 of this PDF http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/...
Yes, and these are effectively 3 different instructions sharing the same mnenomic. If you assemble them:

        mov     0x1000(%rax,%rax), %eax
        mov     %eax, 0x1000(%rax,%rax)
        mov     $7, %eax
You get:

        0000000000000000	8b840000100000  	movl	0x1000(%rax,%rax), %eax
        0000000000000007	89840000100000  	movl	%eax, 0x1000(%rax,%rax)
        000000000000000e	b807000000      	movl	$0x7, %eax
Which is basically 0x8b->Load, 0x89->Store, 0xb8->Constant.
ARM mov can be used as a JMP <addr in reg>. Program counter is in fact register R15 in ARM.
This changed with V8, IIRC the PC can only be updated on branch or an exception...
If you're going down that path, all of x86 is used as a 'bytecode' that turns into microcode instructions anyway in modern processors.
Not exactly, most instructions are not "microcoded" for performance reasons. But there is a class of instructions that is.
Intel CPUs translate x86 to some internal RISC-like "uops" before doing optimizations and execution. This is a widely documented fact (though Intel doesn't really talk publicly about it as far as I know): see for example ยง2.1 in http://www.agner.org/optimize/microarchitecture.pdf
For a (hopefully!) more accessible introduction to the crazy world of what really happens inside the x86, I did a talk at GOTO: http://youtu.be/hgcNM-6wr34 - it's a really fascinating subject!
For marketing reasons, Intel made quite a big commotion over the fact that its Pentium was "RISC-like" when it was first released, but in reality what it meant was using a decoder that could emit multiple uops in parallel instead of sequentially like it was in the 486.

Even CPUs considered RISC today, like ARMs, need to decode instructions into one or more wider uops for efficient execution.