| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mccoyb 28 days ago

Hoping to understand this better:

> Clojure's dynamism is granted by a great deal of both polymorphism and indirection, but this means LLVM has very few optimization opportunities when it's dealing with the LLVM IR from jank.

In my mind, what is happening here is you lower Clojure code into LLVM, with a bunch of runtime calls (e.g. your `jank::runtime::dynamic_call`) (e.g. LLVM invoking the runtime over a C ABI).

If that's true, are there any optimizations that LLVM helps out with? Perhaps like DCE? I can't tell immediately, curious about the answer

(question is obviously about the pre-IR state of things)

1 comments

codebje 27 days ago

The article talks about inlining a two-arity call to clojure.core/max to instead be an explicit call to cpp/jank.runtime.max, eliminating the unnecessary argument count matching and recursion portions of the Clojure function.

It also mentions that in Clang the runtime max function will itself be inlined, so that's something LLVM ("the LLVM project", anyway) is still doing - and beyond that, as written this IR is likely to leave behind plenty of opportunities for LLVM to do the things it's good at: DCE, load/store optimisation, constant propagation, etc. And register allocation.

The jank::runtime::max call is itself complex: it's got to type check its arguments and work out what to actually do based on the two types; if parts of these tests are done before the inlined call to max there's a fair chance that LLVM will be able to eliminate their repetition and slim it all down a long way. In the fibonnaci example the fact that a previous test will have likely identified whether the argument is an int or something else should hopefully carry over for ::lte, ::sub, and ::add and simplify those down to just the single operator call - but sadly I suspect it won't at least for the addition, because the recursive call will lose the information that the return value when called with a tagged integer is always a tagged integer.

A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR (:metadata tag functions as specialised for <type> with the new entry point, if a function only calls specalised functions (and itself) it too can be specialised, and a heuristic to determine if specialisation gains enough to sacrifice space for it).

link

Jeaye 27 days ago

The first three paragraphs here are on point! jank's IR passes will not worry much about things like load/store optimization, register allocation, inlining C++ functions, etc. These are in LLVM's domain. We just worry about the Clojure side of things. Polymorphic math is intense, but we do our best to avoid the extra work by unboxing whenever possible.

> A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR

All of these math functions are templates with four specific categories:

1. Object and object

2. Primitive and primitive

3. Primitive and object

4. Object and primitive

We handle the difference between typed objects (like integer_ref) and type-erased objects (object_ref) as well. This template then gets inlined, which is exactly what the last step of the benchmark optimizations (adding annotations) ensured. The return type of these functions will prefer primitive types, rather than automatically boxing. jank's analyzer tracks all types used, at compile-time, and supports automatic boxing. This means that we're already using the most optimal primitive math whenever we can and that it will indeed inline to just an operator call when working on two primitives, or two typed objects, or a combination thereof.

You can see the code for this here: https://github.com/jank-lang/jank/blob/29c2adb344526d26c8e82...

link

codebje 26 days ago

Thanks for the response. I really like the measured, evidence-based approach you're taking to this work.

I have the wrong CPU architectures for pre-built jank packages (x86 mac, aarch64 linux, the exact opposite of 'normal') so I haven't actually looked at what it produces, so my last paragraph was pure speculation. I appreciate the detail you gave!

link