| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tptacek 4866 days ago

This is more or less the greatest thing I've learned about in the last couple years.

What's happening here is that they're getting computation without executing any instructions, simply through the process of using the MMU hardware to "resolve addresses". The page directory system has been set up in such a way that address resolution effects a virtual machine that they can code to.

This works because when you attempt to resolve an invalid address, the CPU generates a trap (#PF), and the handling of that trap pushes information on the "stack". Each time you push data to the stack, you decrement the stack pointer. Eventually, the stack pointer underflows; when that happens, a different trap (#DF) fires. This mechanism put together gives you:

    if x < 4 { goto b } else { x = x - 4 ; goto a }

also known as "subtract and branch if less than or equal to zero", also known as "an instruction adequate to construct a one-instruction computer".

The virtual machine "runs" by generating an unending series of traps: in the "goto a" case, the result of translation is another address generating a trap. And so on.

The details of how this computer has "memory" and addresses instructions is even headachier. They're using the x86 TSS as "memory" and for technical reasons they get 16 slots (and thus instructions) to work with, but they have a compiler that builds arbitrary programs into 16-colored graphs to use those slots to express generic programs. Every emulator they could find crashes when they abuse the hardware task switching system this way.

Here's it running Conway's Life:

http://youtubedoubler.com/?video1=E2VCwBzGdPM&start1=0&#...

Here's their talk for a few months back:

http://www.youtube.com/watch?v=NGXvJ1GKBKM

The talk is great, but if you're not super interested in X86/X64 memory corruption countermeasures, you might want to skip the first 30 minutes.

3 comments

dbarlett 4866 days ago

Reminds me of "ICMP delay-line memory": http://stackoverflow.com/questions/12748246/sorting-1-millio...

link

0x0 4866 days ago

The slides in the github repo ( https://github.com/jbangert/trapcc/blob/master/slides/PFLA-s... ) also have a few interesting points, like "No publicly available simulator implements this correctly" (how did they record the youtube video?) and a few vague hints about exploiting this for doing VM escapes.

link

jey 4866 days ago

> "No publicly available simulator implements this correctly" (how did they record the youtube video?)

Probably by patching the emulator.

link

calt 4866 days ago

> how did they record the youtube video?

By running it on a physical machine. Unless there is a requirement that the processor not be multitasking that I am missing.

link

mikeash 4866 days ago

The Life video pretty clearly shows it running in Bochs. I assume they fixed it up.

link

jbangert 4866 days ago

Author here: Actually, the slide 'no publically available simulator implements this correctly' is somewhat misleading (it had a follow up slide that I cut and replaced by verbal comments - I should re-add it to the PDF). What I meant is that the entire mechanism (Task switching, page fault handling, etc.) is not implemented correctly on any sim, i.e. you have subtly different behavior. This was quite a challenge in debugging this in the first place, so I actually wrote the code to work around any bochs quirks I encountered -- just so that I have a debug environment.

link

tptacek 4866 days ago

By fixing bugs in Bochs?

link

jd007 4865 days ago

Isn't it a bit misleading to say that they are getting computation without executing any instructions? It's true that they are not executing any x86 instructions from the CPU, but the MMU is doing all the work by executing its instructions of address resolution.

Actually is it even true that they are not executing any x86 instructions on the CPU? From my understanding, the handling of page faults needs the CPU to execute some instructions. Maybe I'm wrong, if so could you enlighten me? Thanks.

link

simias 4865 days ago

I'm not very familiar with the x86 architecture, but usually when there's a fault the CPU generally attempts to lookup the address of a "callback" function in an interrupt/fault vector.

I suppose if you setup everything very carefully you can make it fault over and over again without giving it the time to execute any instruction.

Without looking into the specifics, I think it's very possible that the CPU is not actually executing any instructions, just waiting for the MMU to get a hold of itself. After all, in order to simply load the instructions you need the MMU to be responsive (or deactivated I suppose, if there's such a thing as no-MMU x86).

link

jd007 4865 days ago

If I understand MMU correctly, it is merely an address translator and physical memory data fetcher. It cannot process page faults, and when it encounters one, it will have to signal the CPU, because the CPU and the OS on top of it knows how to handle faults. Even if faults are generated repeatedly, doesn't the CPU still have to execute the instructions to push the stack which is how this "instruction-less" machine works? Unless there are certain PFs where the MMU will not signal the CPU and tries to handle the fault by itself.

link

limmeau 4865 days ago

The MMU is not so cleanly separable from the CPU on x86.

386 has both a segment mechanism and a paging mechanism. The segment mechanism has several luxury features like automatic saving and restoring of task context. It is possible to set up a segment descriptor so that the CPU, when jumping (or faulting) to an address in the segment, will automatically save task state at one address (taken from a register) and restore task state from another address (taken from the descriptor). Ibelieve that's what they use here. Hence, free memory accesses.

link