Hacker News new | ask | show | jobs
by derefr 2897 days ago
> To take advantage of jets, the programmer has to get the shape of the code right as well as the semantics.

Not true at all, because jets are almost never something that exist in the language the programmer is writing in. They exist in the language the compiler is targeting.

Here's a concrete example (something otherwise missing in this thread): the keccak256() hash function in Solidity.

The keccak256() function just looks like any other function, from the perspective of writing code in Solidity.

But when you compile your Solidity program containing a keccak256() call, it compiles not to an inline implementation of keccak256(), or to a single intrinsic EVM opcode for "doing keccak256", but rather to a call to a keccak256-implementing smart contract at a "known" address, created alongside the particular Ethereum network.

That smart contract does have the plain EVM code in it to compute the keccak256 hash for a given input, and if you implemented a "dumb" Ethereum VM for your Ethereum network node, that's what would happen. It would be very slow and expensive to run, but it would work just fine.

But, instead, in less-naive EVM implementations (including the reference EVM), there's a jet: the EVM opcode sequence for "call the known keccak256 smart contract" is pattern-matched, and instead of actually doing that, a native keccak256() function is called instead. (Or, alternately, the definition of the "CALL" op checks the call address, and in the case of the known addresses, calls the native function instead. Same difference.)

The Solidity programmer remains completely unaware of this. Instead, it's a contract between the developers of Solidity, and the developers of (some) EVM implementations, that

1. Solidity will emit code in a structure that the EVM can pattern-match; and that

2. the EVM devs will ensure that such code is valid on all EVM implementations, with or without the jet (in this case by working with the networks to ensure there's a smart-contract in place at the known address that does keccak256 hashing.)

That's all that's required to get a jet working: mutual knowledge of the jet between the developer of a compiler targeting the ISA, and the developer of the interpreter/VM for that ISA.

3 comments

> it compiles not to [...] a single intrinsic EVM opcode for "doing keccak256",

Sorry, no. That is exactly what it compiles to. Opcode 0x20 (erroneously called SHA3) computes a keccak256. See the yellow paper.

Maybe you meant SHA256 or RIPEMD160, which are indeed implemented as so-called "precompiled contracts".

> That smart contract does have the plain EVM code in it.

It does not. No EVM reference implementation is provided for the precompiled contracts. Pre-compiles do things the EVM is not able to do. It is not possible to have fallback EVM implementations.

> the EVM opcode sequence for "call the known keccak256 smart contract" is pattern-matched

No it's not. Precompiles are handled as any other call and results in a call to something like `evm_do_call(address, input)`. This function then special cases the addresses of precompiles. AFAIK no-one does any kind of pattern matching.

> [conclusions]

The whole thing has little to do with pattern matching or (in)formal agreements between Solidity and EVM devs. Such a thing would be quite annoying.

A much better analogy is that the EVM comes with a built-in library of basic functions, much like `libc` does. And Solidity is aware of this library and offers them using a function call syntax, much like `printf`. The way they are called is not much different from how user-written libraries would be called.

> Maybe you meant SHA256 or RIPEMD160, which are indeed implemented as so-called "precompiled contracts".

Oops, yep, that's the one. I misremembered SHA256 as Keccak's SHA3_256. (I've been writing EVM ABI codegen lately, there's a lot of going back-and-forth between them.)

> Precompiles are handled as any other call and results in a call to something like `evm_do_call(address, input)`. This function then special cases the addresses of precompiles.

Yes, that's what I said:

> Or, alternately, the definition of the "CALL" op checks the call address, and in the case of the known addresses, calls the native function instead. Same difference.

There are really two entirely-separate implementation techniques for "jets", and I sort of smushed them together, trying to explain the one using the other. I apologize. Let me be more explicit:

• There's one implementation technique that uses pattern-matching during the instruction decode pipeline stage of a VM or CPU. The EVM doesn't do this one.

• But there's another implementation technique, which the EVM does do, to the same effect. It's also the technique shared by Urbit's Nock VM (and Urbit made up the term "jet", so it definitely applies here.) Under this technique, functions (or in the EVM's case, smart contracts) are loaded into a virtual memory address space (contract address space) at predictable addresses given their content (in the EVM's case this is only true for the precompiles, but in Nock it's for everything†), and then the VM is written to rely on those predictable jump-target addresses (contract addresses), using either a LUT or hard-coded special-casing in the CALL op to find VM-intrinsic native code to jump to when the instruction-pointer would otherwise move to a "jetted" jump-target.

This call-site technique is essentially the same‡ as the pattern-matching case, in terms of its effects on the VM's interpretation speed, the constraints it places on the code, and the level of support a compiler-author must provide if they want to trigger the optimized behavior.

The only difference is that, in the second implementation, you name your bytecode sequences at some level (i.e. give them particular, predictable addresses), such that, rather than pattern-matching on the bytecode itself, you just have to pattern-match on these names when you're calling/jumping, in order to get the "jet" effect.

---

† Nock is not actually a bytecode VM, but a raw AST-level interpreter where the things being "named" with predictable addresses are the AST nodes loaded into the interpreter's memory. Thus, the Nock interpreter goes a lot slower than a regular bytecode VM by default, but in exchange has the ability to "jet" any AST node at random with a native replacement. Given this, Nock could actually add an arbitrary-sub-function-level JIT (expression-level JIT?), despite being an AST-level interpreter that never generates intermediate bytecode. This essentially makes Nock equivalent in potential performance to the "instruction-decode-stage-jetted bytecode VM" implementation, but with fewer, more general "patterns" to recognize, since ASTs are more normalized than bytecode is. (It's like Erlang's parse-transforms, where you're pattern-matching a Core Erlang expression AST and replacing it with a NIF call; but instead of happening as a macro-expand step at compile-time, it happens at expression beta-reduction time for runtime-eval'ed code. This would be costly to pattern-match if you had to look at the expression as an AST tree; but, like I said, Nock gives the AST nodes predictable names based on their content—I think using a cryptographic hash of the subtree with variables hygenized—so you just have to pattern-match on the name.)

‡ The call-site technique is also very similar to a JIT in terms of how the VM's bytecode-level CALL/JMP op is implemented. The difference comes down to how the LUT is being populated—by the JIT from optimized code, vs. by the VM's "special knowledge" of known intrinsic names that un-optimized code will predictably refer to. If you don't know your native-function "names" until you have a look at the code (or a precompiled function cache that came from looking at the code), you've got a JIT. If you know the "names" in advance (like in Ethereum!), you've got jets.

By your description, a jet is an intrinsic written by people who don't know about intrinsics. Plus the bonus of "if you don't trigger the intrinsic it literally costs you money" which is the kind of language-gotcha innovation I would expect from crypto-whatever.
That’s the way you’d look at it if the jet was there before the code got there. The usual point of jets is to make existing code go faster, without recompiling it (such as in the case where all code is immutable forever.)
Um yes JIT intrinsics do that. Well, the JIT is recompiling as needed (or more often), but your bytecode isn't changing.
Jets exist without a JIT. In fact, the whole point of distinguishing "jets" as an idea is that they're a potential feature of naive bytecode interpreters, rather than of JITing interpreters or of (potentially optimizing) compilers.

I mean, you can think of an interpreter with jets, but without a JIT, as an interpreter which passes the code it loads through a specific kind of JIT that only does native codegen for explicitly specified patterns, and has no "generic" path, instead leaving everything else as calls back into the interpreter.

But that's not actually what's happening, because there is no JIT in such interpreters. JITing necessarily happens either during code-loading, or asynchronously with a profiling thread. Jets, meanwhile, happen within the instruction-stream decode stage of a VM, where the VM has enough of a read-ahead buffer that it can decode a long sequence of plain instructions that fits a given pattern, to an intrinsic. Jets are "recognized" within the VM's instruction pipeline itself, each time the pipeline's state matches a given bytecode-sequence pattern. They're a register-transfer level optimization.

In essence, jets are an implementation technique for bytecode interpreter optimization, alternative and complementary to a full JIT.

---

Also, I'm speaking about VMs here, but jets apply as a thing you can do just as well in a hardware CPU design, too.

An example of a common "hardware jet" is recognizing a multi-byte no-op sequence (that is, not a multi-byte NOP intrinsic, using a long instruction, but rather a sequence of single-byte NOPs), and making it have the same effect as a multibyte no-op intrinsic of the same size (i.e. given a NOP sequence of length N, the replacement would free up the ALU and other later pipeline stages for N-1 cycles.)

(There's probably another name this technique is known by in the hardware world—I'm not a hardware guy. I'm just highlighting the equivalence.)

And this isn't just a particular kind of microcode expansion, either. Microcode expansion is effectively a kind of cheap, throw-away JIT: CPUs expand their ISA to microcode not during the decode stage of their pipeline, but rather when the instruction pointer's movement causes the CPU to copy a new chunk of a code page into a cache-line. The cache line is expanded as a whole, and the result stored in a per-core microcode buffer. This works for regular instructions, since regular instructions are necessarily cache-line aligned; but the sort of composite, multi-instruction patterns a CPU might want to recognize aren't guaranteed to be cache-line aligned, so they won't get caught by this pass. A "hardware jet", on the other hand—since it happens at the register-transfer level—can optimize these instructions just fine. So CPU designers use both.

Thank you (seriously) for the detailed explantion.

JITs already do this, to my mind. What else are you going to call it when HotSpot sees bytecode for a loop initializing an array and replaces it with a call to memset? Special-casing instruction patterns is applicable whether or not you're doing native codegen. The big difference is that you have these VMs which have decided not to JIT so they need to fix their perf problems somewhere else, because (and Nock is especially bad about this) a naive interpreter is unacceptably slow.

I was talking about someone programming in the very simple language (e.g. Simplicity, as discussed in the LtU conversation) in which the jets do exist. I guess for you that would be a compiler implementor.

Edit: NB I'm drawing on this particular comment: http://lambda-the-ultimate.org/node/5482#comment-95188