Hacker News new | ask | show | jobs
by paradroid 2895 days ago
This is basically asm or microcode at a higher level.
2 comments

Yeah I was pretty sure I've seen this in both Nim and Common Lisp, but don't compile-time macros in general allow for this? A comment even mentions source-text replacement with C functions. So it seems like the "jet" is just a type of macro that is formally specified to behave as a pure transformation...
If macros are "GOTO", a jet is a "COMEFROM". A programmer knows they're calling a macro, and expects that code to change. A jet, meanwhile, affects code that was written without knowledge of the jet.
It sounds like a aspect oriented programming.
It's the opposite.
That's a stretch. If the only dimension you care about is whether or not the language spec is guaranteed to be invariant, sure, but that's essentially nitpicking. One more layer of abstraction gets you an asm->asm transpiler that never changes.
Take microcode: one human-visible instruction gets implemented by many microcoded instructions.

With jets, a string of many very simple human-visible instructions gets translated to just a few actual machine instructions.

So it's the opposite in the sense that a microcoded system expands each instruction you write, while a jet system contracts strings of the programmer's instructions into more efficient code.

As one of the comments says: "The point is that jets create a particularly vicious abstraction inversion whereby a programmer must simultaneously think inside the object language and a kind of metalanguage."

To take advantage of jets, the programmer has to get the shape of the code right as well as the semantics. It's a bit like writing poetry, with the extra requirements of rhyme and meter in addition to the semantic requirements of straight prose. You have to keep an eye on what jets might match the code you write if you are going to get the required performance.

ASM and microcode have none of this intricacy.

> To take advantage of jets, the programmer has to get the shape of the code right as well as the semantics.

Not true at all, because jets are almost never something that exist in the language the programmer is writing in. They exist in the language the compiler is targeting.

Here's a concrete example (something otherwise missing in this thread): the keccak256() hash function in Solidity.

The keccak256() function just looks like any other function, from the perspective of writing code in Solidity.

But when you compile your Solidity program containing a keccak256() call, it compiles not to an inline implementation of keccak256(), or to a single intrinsic EVM opcode for "doing keccak256", but rather to a call to a keccak256-implementing smart contract at a "known" address, created alongside the particular Ethereum network.

That smart contract does have the plain EVM code in it to compute the keccak256 hash for a given input, and if you implemented a "dumb" Ethereum VM for your Ethereum network node, that's what would happen. It would be very slow and expensive to run, but it would work just fine.

But, instead, in less-naive EVM implementations (including the reference EVM), there's a jet: the EVM opcode sequence for "call the known keccak256 smart contract" is pattern-matched, and instead of actually doing that, a native keccak256() function is called instead. (Or, alternately, the definition of the "CALL" op checks the call address, and in the case of the known addresses, calls the native function instead. Same difference.)

The Solidity programmer remains completely unaware of this. Instead, it's a contract between the developers of Solidity, and the developers of (some) EVM implementations, that

1. Solidity will emit code in a structure that the EVM can pattern-match; and that

2. the EVM devs will ensure that such code is valid on all EVM implementations, with or without the jet (in this case by working with the networks to ensure there's a smart-contract in place at the known address that does keccak256 hashing.)

That's all that's required to get a jet working: mutual knowledge of the jet between the developer of a compiler targeting the ISA, and the developer of the interpreter/VM for that ISA.

> it compiles not to [...] a single intrinsic EVM opcode for "doing keccak256",

Sorry, no. That is exactly what it compiles to. Opcode 0x20 (erroneously called SHA3) computes a keccak256. See the yellow paper.

Maybe you meant SHA256 or RIPEMD160, which are indeed implemented as so-called "precompiled contracts".

> That smart contract does have the plain EVM code in it.

It does not. No EVM reference implementation is provided for the precompiled contracts. Pre-compiles do things the EVM is not able to do. It is not possible to have fallback EVM implementations.

> the EVM opcode sequence for "call the known keccak256 smart contract" is pattern-matched

No it's not. Precompiles are handled as any other call and results in a call to something like `evm_do_call(address, input)`. This function then special cases the addresses of precompiles. AFAIK no-one does any kind of pattern matching.

> [conclusions]

The whole thing has little to do with pattern matching or (in)formal agreements between Solidity and EVM devs. Such a thing would be quite annoying.

A much better analogy is that the EVM comes with a built-in library of basic functions, much like `libc` does. And Solidity is aware of this library and offers them using a function call syntax, much like `printf`. The way they are called is not much different from how user-written libraries would be called.

> Maybe you meant SHA256 or RIPEMD160, which are indeed implemented as so-called "precompiled contracts".

Oops, yep, that's the one. I misremembered SHA256 as Keccak's SHA3_256. (I've been writing EVM ABI codegen lately, there's a lot of going back-and-forth between them.)

> Precompiles are handled as any other call and results in a call to something like `evm_do_call(address, input)`. This function then special cases the addresses of precompiles.

Yes, that's what I said:

> Or, alternately, the definition of the "CALL" op checks the call address, and in the case of the known addresses, calls the native function instead. Same difference.

There are really two entirely-separate implementation techniques for "jets", and I sort of smushed them together, trying to explain the one using the other. I apologize. Let me be more explicit:

• There's one implementation technique that uses pattern-matching during the instruction decode pipeline stage of a VM or CPU. The EVM doesn't do this one.

• But there's another implementation technique, which the EVM does do, to the same effect. It's also the technique shared by Urbit's Nock VM (and Urbit made up the term "jet", so it definitely applies here.) Under this technique, functions (or in the EVM's case, smart contracts) are loaded into a virtual memory address space (contract address space) at predictable addresses given their content (in the EVM's case this is only true for the precompiles, but in Nock it's for everything†), and then the VM is written to rely on those predictable jump-target addresses (contract addresses), using either a LUT or hard-coded special-casing in the CALL op to find VM-intrinsic native code to jump to when the instruction-pointer would otherwise move to a "jetted" jump-target.

This call-site technique is essentially the same‡ as the pattern-matching case, in terms of its effects on the VM's interpretation speed, the constraints it places on the code, and the level of support a compiler-author must provide if they want to trigger the optimized behavior.

The only difference is that, in the second implementation, you name your bytecode sequences at some level (i.e. give them particular, predictable addresses), such that, rather than pattern-matching on the bytecode itself, you just have to pattern-match on these names when you're calling/jumping, in order to get the "jet" effect.

---

† Nock is not actually a bytecode VM, but a raw AST-level interpreter where the things being "named" with predictable addresses are the AST nodes loaded into the interpreter's memory. Thus, the Nock interpreter goes a lot slower than a regular bytecode VM by default, but in exchange has the ability to "jet" any AST node at random with a native replacement. Given this, Nock could actually add an arbitrary-sub-function-level JIT (expression-level JIT?), despite being an AST-level interpreter that never generates intermediate bytecode. This essentially makes Nock equivalent in potential performance to the "instruction-decode-stage-jetted bytecode VM" implementation, but with fewer, more general "patterns" to recognize, since ASTs are more normalized than bytecode is. (It's like Erlang's parse-transforms, where you're pattern-matching a Core Erlang expression AST and replacing it with a NIF call; but instead of happening as a macro-expand step at compile-time, it happens at expression beta-reduction time for runtime-eval'ed code. This would be costly to pattern-match if you had to look at the expression as an AST tree; but, like I said, Nock gives the AST nodes predictable names based on their content—I think using a cryptographic hash of the subtree with variables hygenized—so you just have to pattern-match on the name.)

‡ The call-site technique is also very similar to a JIT in terms of how the VM's bytecode-level CALL/JMP op is implemented. The difference comes down to how the LUT is being populated—by the JIT from optimized code, vs. by the VM's "special knowledge" of known intrinsic names that un-optimized code will predictably refer to. If you don't know your native-function "names" until you have a look at the code (or a precompiled function cache that came from looking at the code), you've got a JIT. If you know the "names" in advance (like in Ethereum!), you've got jets.

By your description, a jet is an intrinsic written by people who don't know about intrinsics. Plus the bonus of "if you don't trigger the intrinsic it literally costs you money" which is the kind of language-gotcha innovation I would expect from crypto-whatever.
That’s the way you’d look at it if the jet was there before the code got there. The usual point of jets is to make existing code go faster, without recompiling it (such as in the case where all code is immutable forever.)
I was talking about someone programming in the very simple language (e.g. Simplicity, as discussed in the LtU conversation) in which the jets do exist. I guess for you that would be a compiler implementor.

Edit: NB I'm drawing on this particular comment: http://lambda-the-ultimate.org/node/5482#comment-95188

What's the best way to say this..

It's a masochistic method of programming where you insert a randomly generated layer of abstraction between the hardware and the programmer, and tell the programmer to make sure to add enough hints to his code to make sure that it runs quickly.

The benefits are unclear.