Hacker News new | ask | show | jobs
by soldergenie 3807 days ago
I ran into the need for this recently, except for a program written in Java... I really wish the JVM had a 128 bit data type (or at least a muldiv function, in which muldiv(a, b, c) = ab/c, but the intermediate ab result can exceed 64 bits.
4 comments

You could split A into multiple components (1st to 4th byte, 5th to 8th byte), and multiply/divide those components separately. That would still fit into 64bit.

Partial example is here: https://codereview.stackexchange.com/questions/72286/multipl...

Yeah, I did try that, but performance was lacking. Java, if you stick with primitives and use a C-like programming style (miniminal objects) was pretty close to C performance when I tried it, except for this one gap.
As I read the article, I was wondering if the GCC compiler was converting some of the code to intrinsics?

If this is the case, and I'm not a Java programmer, I was wondering if there is a way in Java to tell the JVM to do something similar and, if so, how does the JVM cope with different CPUs. Does the JVM, on x86-64 for example, know when it can use SSE instructions?

Speaking of hotspot: No intrinsics for >64 bits. Comparing Strings/copying arrays does utilize SSE. There is a minimal support for auto-vectorizing some simple loops. [0] (4y old thread)

[0]: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/...

In those cases you could of course opt to write the high-performance bit in C and have Java call that. No reason to stick to one language if performance is critical.
The overhead of calling a JNI function is considerable so this only works for functions that do a relatively large amount of work. There are also a ton of other drawbacks to this related to the fact that you've now escaped the JVM, miss out on the management of memory for those functions (so better make them stateless) and a host of other potential pitfalls depending on what platform you're doing this.

Definitely not an 'of course', and for critical performance if the function is small enough it might actually be slower to write it in C and call it from the JVM.

Indeed, you need a relatively large direct ByteBuffers and optimally paralyze the JNI call to take benefit of multi-core. It's a non-trivial task to get stable performance boosts unless the data volume is sufficient.
write one? it's simple algebra.
The problem is performance when working with an entire object rather than a primitive type.
Your homegrown object wouldn't have any special meaning in the JVM and wouldn't be able to be optimized as a 128-bit integer.
Parent probably means writing the muldiv function, which would return a "basic" 64bit int.