|
It was almost definitely HP Dynamo. (Edit: if you combine ideas from HP Dynamo, SafeTSA JIT-optimized bytecode, and IBM's AS/400's TIMI/Technology Independent Machine Interface, you get a better version of the current Android Run Time for bytecode-distributed apps that compile ahead of time to native code and self-optimize at runtime based on low-overhead profiling.) The really nice thing about Dynamo was that it was a relatively simple trace-based JIT compiler from native code to native code (plus a native code interpreter for non-hotspots). This meant that it would automatically inline hotspots across DLLs and through C++ virtual method dispatches (with appropriate guards to jump back to interpreter mode if the virtual method implementation didn't match or the PLT entry got modified). They didn't have to do any special-casing of the interpreter to handle virtual method calls or cross-DLL calls, it's just a natural consequence of a trace-based JIT from native code to native code. The only downsides of something like Dynamo are (1) a bit of complexity and space usage (2) some startup overhead due to starting in interpretive mode and (3) if your program is abnormal in not having a roughly Zipf distribution of CPU usage, the overhead is going to be higher. Ever since I read about Michael Franz et al.'s SafeTSA SSA-based JVM bytecode that more quickly generated higher-performing native code, I've had a long-term back-burner idea to write a C compiler that generates native code in a particular way (functions are all compiled to arrays of pointers to strait-line extended basic blocks) that makes tracing easier, and also storing a SafeTSA-like SSA bytecode along with the native code. That way, a Dynamo-like runtime wouldn't use an interpreter, and when it came to generate an optimized trace, it could skip the first step of decompiling native code to an SSA form. (Also, the SSA would be a bit cleaner as input for an optimizer, as the compilation-decompilation round-trip tends to make the SSA a bit harder to optimize, as shown by Franz's modification of Pizza/JikesRVM to run both SafeTSA and JVM bytecode.) Once you have your trace, you don't need on-stack replacement to get code in a tight loop to go into the optimized trace, you just swap one pointer to native code in the function's array of basic blocks. (All basic blocks are strait-line code, so the only way to loop is to jump back to the start of the same basic block via the array of basic block pointers.) The background for HP Dynamo is that during the Unix wars, there were a bunch of RISC system vendors vying for both the high-end workstation and server markets. Sun had SPARC, SGI had MIPS, DEC had Alpha AXP (and earlier, some MIPS DECStations) and HP had PA-RISC. The HP Dynamo research project wanted to show that emulation via dynamic recompilation could be fast, so to get an apples-to-apples comparison for emulation overhead, they wrote a PA-RISC emulator for PA-RISC. |
It's used by the winafl fuzzer to provide basic block coverage for black box binaries.
https://dynamorio.org