Hacker News new | ask | show | jobs
by openasocket 3025 days ago
I'm not an expert on these things, but it seems to me if you're implementing a database in Java you wouldn't want to keep your data on the JVM Heap, as this seems to indicate. My understanding is that in most applications (like servers) the average object lives for a very short period of time, and most GC implementations are built from that idea. But, in a database, especially an in-memory database, the majority of the objects are going to live for a very long time. That makes the mark phase of GC a lot more expensive, puts more pressure on the generations, etc.

Is my guess here correct, or are there things I'm missing or mistaken on?

5 comments

This is correct; the standard approach here is to use regular c-style memory management for the data the system is managing, and the JVM heap only for the database "infrastructure".

This hybrid approach gives the benefit of a managed runtime and safety of GC for most of your code, but allows the performance of raw pointers/malloc for key code paths.

Some examples of this pattern on the JVM:

- The Neo4j Page Cache, Muninn, https://github.com/neo4j/neo4j/blob/3.4/community/io/src/mai...

- The Netty projects implementation of jemalloc for the JVM: https://github.com/netty/netty/blob/4.1/buffer/src/main/java...

> but allows the performance of ... malloc for key code paths.

Everything is relative, I guess.

Hah :) I don't mean that malloc itself is fast, I mean that having non-jvm heap memory is fast.

A permanent memory block on the JVM heap can't use pointers to refer to it, since GC moves objects around. And even though those blocks will never be collected, they make up additional work for the GC to track.

If so any JVM based datastore could probably benefit.

I wonder how long before we see a similar result from ElasticSearch. (Only other huge JVM based store I can think of).

Hbase is going offheap as much as possible. Voltdb uses java for management and c++ for low-level.

They will write c++ in java eventually. Depending on how much performance you REALLY need.

The same for elasticseach, if you want performance you need to do the same thing scylladb did to cassandra (per-core-sharding, skip filesystem across cores etc)

In elasticsearcch terms, vespa.ai, which claims better performance/scalability/maintanability uses c++ for lucene layer and java for the solr/elasticsearch layer.

There are blog posts speeding lucene by 2x+ by changing some stuff to c/c++. There are libraries (trinity) claiming 2x+ performance .

There is google-engineer saying "bigtable is 3x faster than hbase" that I've read.

People with an interest may want to check out Azul's Zing JVM and the following blog post testing it out with Lucene... http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-w...
Cassandra does a bunch of stuff off-heap - we keep things like Bloom Filters, our compression offsets (to seek into compressed data files), and even some of the memtable (the in-memory buffer before flushing) in direct memory, primary for the reasons you describe.

We still have "other" things on-heap. The biggest contributor to GC pain tends to be the number of objects allocated on the read path, so this patch works around that by pushing much of that logic to rocksdb.

There are certainly other things you can do in the code itself that would also help - one of the biggest contributors to garbage is the column index. CASSANDRA-9754 fixes much of that (jira is inactive, but the development work on it is ongoing).

The purpose of separating into young and old generation is that it's easier to find dead objects in the young generation (as you said, average object lives for a short period of time). You only have to scan this subset for a minor GC. It doesn't really matter how many long-lived objects you have as long as you can avoid needing to do a major GC.
Don't you still need to scan the old generation during minor GC, in case a field in one of the older objects was modified to point to an object in the young generation? Or are there optimizations you can use to quickly and efficiently find references from the older generation to the younger?
> in case a field in one of the older objects was modified to point to an object in the young generation?

This is typically handled by https://en.wikipedia.org/wiki/Write_barrier#In_Garbage_colle...

For a long time, the guidance was to install jemalloc and then use off-heap objects. I can’t recall what it was, but that broke in the 3.0.x series and is unlikely to return. The feature stream (3.1.x) allegedly can use jemalloc again, but we’ve been slow to adopt it so I can’t provide proof.
The good news is, at least for my workloads, garbage collection is significantly better in 3.0 than 2.1 even without off-heap objects. Not sure if it's generating less or just generating it in a way that's easier to collect cheaply, but I saw pause times and total collection work drop significantly with the same settings (G1 collector)
If you want to keep data off heap you need to use sun.misc.Unsafe and allocate / free by yourself. I guess it is called unsafe for a reason. With G1GC you can do magical things to reduce the GC overhead which I always recommend as the first step before trying off heap.
ByteBuffer.allocateDirect is another off-JVM-heap solution that's not marked unsafe.
You don't even have to go through Unsafe. ByteBuffer.allocateDirect() gives you a chunk of off-heap memory.