Hacker News new | ask | show | jobs
by nikic 1136 days ago
From this kind of response, it's always so hard to tell whether it's easy for you because you have the relevant experience, or you just think it's easy because you don't.

A (very non-exhaustive) list of interesting questions about IR design are:

* What's your model for delayed UB and UB materialization?

* What's your provenance model?

* What's your strategy for information retention and handling of flow-sensitive facts?

* What inherent canonicality is there?

* What's your approach to target or domain specific extensions?

* What is your strategy for formal verification of refinements?

Questions like "What instructions does your IR support?" are fairly uninteresting, and are not what IR design is (mostly) about.

It's worth noting that LLVM's own IR design doesn't have a very good answer to some of those questions either, in part because making changes to an IR that is as widely used as LLVM IR is hard (been there, done that). It's easier to design a new IR than to change an existing one -- however, unless you just want to reinvent past mistakes, it is certainly helpful to have deep familiarity with an existing IR design and its problems.

2 comments

You are correct in that I could be merely thinking it's easy. So let me answer your questions as best I can.

* UB is not exposed to the compiler/IR on purpose. I don't want compilers using it as an excuse to be adversarial like today's compilers. There is still UB, but as little as I can get away with (mostly in data races and race conditions).

* I'm designing a new provenance model right now. Incomplete. If anything is hard, it's this. And it might be.

* It is possible to attach information to any item or group of items. In fact, my IR will be able to attach so much information, it should be possible for high-level passes to reconstruct high-level code. Think MLIR with pure data. For example, you could group basic blocks together and label tell as coming from a while loop. You will also be able to generate and use e-graphs. My model will also be different than LLVM's. Analysis will only happen on the original IR, with information generated only on the original. Thus, no information is lost before analysis.

* Canonicality is another one I'm working on and may be hard, though less hard than provenance because my IR uses basic block arguments instead of a phi node. But the basic idea would be that optimization happens differently, so canonicalization should be easier than in LLVM because I'm going to design it to be.

* User-specified instructions with a way of defining their semantics in code.

* I'm starting with e-graphs and going from there. Still working on this.

I agree with you that an instruction list is not interesting at all.

Anyway, yeah, you are right, but I think I have mostly satisfactory answers.

Is there a good resource you would recommend to get started understanding this as a hobby?