| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by derefr 2895 days ago

> Maybe you meant SHA256 or RIPEMD160, which are indeed implemented as so-called "precompiled contracts".

Oops, yep, that's the one. I misremembered SHA256 as Keccak's SHA3_256. (I've been writing EVM ABI codegen lately, there's a lot of going back-and-forth between them.)

> Precompiles are handled as any other call and results in a call to something like `evm_do_call(address, input)`. This function then special cases the addresses of precompiles.

Yes, that's what I said:

> Or, alternately, the definition of the "CALL" op checks the call address, and in the case of the known addresses, calls the native function instead. Same difference.

There are really two entirely-separate implementation techniques for "jets", and I sort of smushed them together, trying to explain the one using the other. I apologize. Let me be more explicit:

• There's one implementation technique that uses pattern-matching during the instruction decode pipeline stage of a VM or CPU. The EVM doesn't do this one.

• But there's another implementation technique, which the EVM does do, to the same effect. It's also the technique shared by Urbit's Nock VM (and Urbit made up the term "jet", so it definitely applies here.) Under this technique, functions (or in the EVM's case, smart contracts) are loaded into a virtual memory address space (contract address space) at predictable addresses given their content (in the EVM's case this is only true for the precompiles, but in Nock it's for everything†), and then the VM is written to rely on those predictable jump-target addresses (contract addresses), using either a LUT or hard-coded special-casing in the CALL op to find VM-intrinsic native code to jump to when the instruction-pointer would otherwise move to a "jetted" jump-target.

This call-site technique is essentially the same‡ as the pattern-matching case, in terms of its effects on the VM's interpretation speed, the constraints it places on the code, and the level of support a compiler-author must provide if they want to trigger the optimized behavior.

The only difference is that, in the second implementation, you name your bytecode sequences at some level (i.e. give them particular, predictable addresses), such that, rather than pattern-matching on the bytecode itself, you just have to pattern-match on these names when you're calling/jumping, in order to get the "jet" effect.

---

† Nock is not actually a bytecode VM, but a raw AST-level interpreter where the things being "named" with predictable addresses are the AST nodes loaded into the interpreter's memory. Thus, the Nock interpreter goes a lot slower than a regular bytecode VM by default, but in exchange has the ability to "jet" any AST node at random with a native replacement. Given this, Nock could actually add an arbitrary-sub-function-level JIT (expression-level JIT?), despite being an AST-level interpreter that never generates intermediate bytecode. This essentially makes Nock equivalent in potential performance to the "instruction-decode-stage-jetted bytecode VM" implementation, but with fewer, more general "patterns" to recognize, since ASTs are more normalized than bytecode is. (It's like Erlang's parse-transforms, where you're pattern-matching a Core Erlang expression AST and replacing it with a NIF call; but instead of happening as a macro-expand step at compile-time, it happens at expression beta-reduction time for runtime-eval'ed code. This would be costly to pattern-match if you had to look at the expression as an AST tree; but, like I said, Nock gives the AST nodes predictable names based on their content—I think using a cryptographic hash of the subtree with variables hygenized—so you just have to pattern-match on the name.)

‡ The call-site technique is also very similar to a JIT in terms of how the VM's bytecode-level CALL/JMP op is implemented. The difference comes down to how the LUT is being populated—by the JIT from optimized code, vs. by the VM's "special knowledge" of known intrinsic names that un-optimized code will predictably refer to. If you don't know your native-function "names" until you have a look at the code (or a precompiled function cache that came from looking at the code), you've got a JIT. If you know the "names" in advance (like in Ethereum!), you've got jets.