|
|
|
|
|
by saagarjha
1312 days ago
|
|
I’ve been working on an early design of a high-performance dynamic binary translator that cannot JIT, and have reached a very similar conclusion as the author. We have an existing threaded interpreter but it’s a mess of hard-to-maintain assembly for two architectures, and we run into funny issues all the time where the two diverge. Plus, being handwritten by people who are not scheduling experts, there is probably some performance left on the table because of our poor choices and the design making it difficult to write complex-but-more-performant code. Nobody wants to write an efficient hash for TLB lookups in a software MMU using GAS macros. The core point I’ve identified is that existing compilers are pretty good at converting high level descriptions of operations into architecture-specific code (at least, better than we are given the amount of instructions we have to implement) but absolutely awful at doing register selection or dealing with open control flow that is important for an interpreter. Writing everything in assembly lets you do these two but you miss out on all the nice processor stuff that LLVM has encoded into Tablegen. Anyways, the current plan is that we’re going to generate LLVM IR for each case and run it through a custom calling convention to take that load off the compiler, similar to what the author did here. There’s a lot more than I’m handwaving over that’s still going to be work, like whether we can automate the process of translating the semantics for each instruction into code, how we plan to pin registers, and how we plan to perform further optimizations on top of what the compiler spits out, but I think this is going to be the new way that people write interpreters. Nobody needs another bespoke macro assembler for every interpreter :) |
|
One of the biggest problems is when cold paths compromise the efficiency of hot paths. You would hope that __builtin_expect() would help, but from what I can tell __builtin_expect() has no direct impact on register allocation. I wish the compiler would use this information to make sure that cold paths can never compromise the register allocation of the hot paths, but I constantly see register shuffles or spills on hot paths that are only for the benefit of cold paths.
Is there anywhere I can follow your work? I am very interested in keeping track of the state of the art.