Hacker News new | ask | show | jobs
by jakewins 3025 days ago
This is correct; the standard approach here is to use regular c-style memory management for the data the system is managing, and the JVM heap only for the database "infrastructure".

This hybrid approach gives the benefit of a managed runtime and safety of GC for most of your code, but allows the performance of raw pointers/malloc for key code paths.

Some examples of this pattern on the JVM:

- The Neo4j Page Cache, Muninn, https://github.com/neo4j/neo4j/blob/3.4/community/io/src/mai...

- The Netty projects implementation of jemalloc for the JVM: https://github.com/netty/netty/blob/4.1/buffer/src/main/java...

2 comments

> but allows the performance of ... malloc for key code paths.

Everything is relative, I guess.

Hah :) I don't mean that malloc itself is fast, I mean that having non-jvm heap memory is fast.

A permanent memory block on the JVM heap can't use pointers to refer to it, since GC moves objects around. And even though those blocks will never be collected, they make up additional work for the GC to track.

If so any JVM based datastore could probably benefit.

I wonder how long before we see a similar result from ElasticSearch. (Only other huge JVM based store I can think of).

Hbase is going offheap as much as possible. Voltdb uses java for management and c++ for low-level.

They will write c++ in java eventually. Depending on how much performance you REALLY need.

The same for elasticseach, if you want performance you need to do the same thing scylladb did to cassandra (per-core-sharding, skip filesystem across cores etc)

In elasticsearcch terms, vespa.ai, which claims better performance/scalability/maintanability uses c++ for lucene layer and java for the solr/elasticsearch layer.

There are blog posts speeding lucene by 2x+ by changing some stuff to c/c++. There are libraries (trinity) claiming 2x+ performance .

There is google-engineer saying "bigtable is 3x faster than hbase" that I've read.

People with an interest may want to check out Azul's Zing JVM and the following blog post testing it out with Lucene... http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-w...