Hacker News new | ask | show | jobs
by thrown_22 1416 days ago
>if I need to do low latency

If the JVM is considered low latency I shudder to think what is high latency.

3 comments

Java is pretty fast. Second most popular language in HFT. Can get it to a few tens of micros. Not as fast as C++ at sub 5 micros. So good enough for many latency sensitive apps.
> Not as fast as C++ at sub 5 micros.

Try sub 5 nanos. I was curious awhile ago at how fast C++ hash set lookup was compared to C#, and it consistently performed a lookup at 1 nanosecond. I tested with up to 6GB of data and then stopped because it was taking longer to generate random data then it was to run the benchmark 10,000 times.

C++ benchmarks here[0]. It's a bit more complicated then just a pure lookup since I was pulling some code out of a larger app, but the benchmark is only measuring the lookup speed. I did the C# benchmarks with BenchmarkDotNet or something like that, I can never remember the exact name.

[0]: https://gist.github.com/ambrosiogabe/66a6e2fdc77e6a600e570f4...

> and it consistently performed a lookup at 1 nanosecond.

TBH I'm skeptical that you are measuring what you think you are measuring. There are a lot of micro-benchmarking pitfalls, like dead code elimination, loop-invariant code motion, unrolling, and other issues. Unless you actually looked at the machine code coming out of the compiler, you're measuring something you don't understand. E.g. 1 nanosecond is roughly 3-6 instructions. That 100% means the hash lookup has been inlined into the benchmarking loop.

Are your hashtables mostly empty? Really small? Lots of easy hits (or easy misses)? Because the slow cases (actually looking up) are going to be hairier and may not be inlined.

Did you benchmark against Java's HashMap? Because it is also very, very, very fast for simple cases.

It looks like caching definitely skewed the results a bit. You can take a look at the linked code yourself. Worst case was still only around 80 nanoseconds which is definitely slower, but still orders of magnitude faster than "sub 5 micros".

Don't take my word for it though, you can take a look at the Robin Hood benchmarks[0]. Robin Hood unordered map is a competitive hash map that's performed much better than the STL for me in many cases. They average a 4 nanosecond lookup speed for a hash map with 2000 elements and an integer key.

> Did you benchmark against Java's HashMap?

I benchmarked against C#, which has a runtime that performs similar if not better than the JVM. The C# code was a ~~few microseconds~~ around 130 nanoseconds. Which is still very fast, but up to 100x slower. (And yes, this was after warming up the code. I used benchmark dot net[1] here.). This is a really easy benchmark to set up. If you doubt me you can write a couple of benchmarks in under an hour and compare yourself.

[0]: https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-04-0...

[1]: https://benchmarkdotnet.org/articles/overview.html

Did I read that right? The C# version is computing SHA256 hashes?
That's a link to the benchmark framework that I used. The C# benchmark are in a separate gist that I didn't feel like digging up. This is the C# benchmarks[0]. All the interfaces and indirection is the result of me adapting this from a separate comment. But the benchmark is just testing `HashSet.TryGetValue`.

Edit: I just re-ran the benchmarks because I didn't have the results pasted in the snippet (which I've now done so I don't keep getting this wrong haha). The C# HashSet takes around 130 nanoseconds, not microseconds. So it's not orders of magnitude slower, but it is still more than a 2x slow down and up to a 100x slow down in the case of an integer key.

[0]: https://gist.github.com/ambrosiogabe/ba6bd0fa80588c2fd2ca26d...

I'm talking about an end to end HFT system in C++.
All our market data feed handlers are sub 5 micro in the 99th. All in Java.
Java is heavily used in high-frequency trading. I believe it's the most popular language after C++.
Indeed. Shutoff Garbage collection completely and it can work. (And make sure your Java code creates no garbage - which is a new type of programming in and of itself)
Maybe I’m taking your comment wrong, why is this a bad thing? What other GC’d language just lets you turn it off?
I don't think their comment is intended to be negative really -- looks more like appreciative of the option, while cognizant of the fact that using it introduces a new challenge.
It's not about turning it off. You just don't allocate on the hot path.
> This is because Zing uses a unique collector called C4 (Continuously Concurrent Compacting Collector) that allows pauseless garbage collection regardless of the Java heap size.

This is incorrect. Firstly, C4 triggers two types of safepoints: thread-local, and jvm-wide. The latter can easily go into the region of ~200 micros for heaps of ~32GB even when running on fast, overclocked servers. Secondly, the design of C4 incurs performance penalty for accessing objects due to memory barriers. This impacts median latency noticeably.

You might not believe me, but ask Gil and he'll openly admit it. This article was written by someone who:

1) doesn't know how C4 works

2) doesn't analyze relevant metrics from their JVMs

Edit: My bad - unfortunate wording. I didn't mean this as negative at all. It's cool (if niche) ability.
So is python. Doesn't mean you use it for your latency critical software.
Depends on the use case, but if you are working on web servers or other long lived processes the JVM is pretty close to native and can even beat native code thanks to JIT compilation.