|
|
|
|
|
by kjeetgill
2907 days ago
|
|
Awesome. I wonder how well this works on a stock JDK10 using graal. Whenever I see a speed boost to do what is conceptually the same thing I'm always curious where the fat was cut. What did we give up? You can dump the resulting assembly with
-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly and diff might be revealing. My hunch is that the line from the tutorial: `@CFunction(transition = Transition.NO_TRANSITION)`
makes all the difference. Explanation of NO_TRANSITION from [0]: No prologue and epilogue is emitted. The C code must not block and must not call back to Java. Also, long running C code delays safepoints (and therefore garbage collection) of other threads until the call returns. Which is probably great for BLAS-like calls. This lines up with my understanding from Cliff Click's great talk "Why is JNI Slow?"[1] basically saying that to be faster you need make assumptions about what the native code could and couldn't do and that generally developers would shoot themselves in the foot. [0]: https://github.com/oracle/graal/blob/master/sdk/src/org.graa...
[1]: https://www.youtube.com/watch?v=LoyBTqkSkZk |
|
"JNI is slow", being the conventional wisdom, and knowing just how frequent the calls would be, people had ignored it as an option.
Randomly one of the devs who was most bothered by the bottleneck, had an hour spare and threw the conventional wisdom out the window and dropped in JNI calls to an standard (highly optimised) library and re-benchmarked. 40% performance boost. Further experiments found that "JNI is slow" isn't as true as conventional wisdom quite had it.