Hacker News new | ask | show | jobs
by ADefenestrator 2518 days ago
I'm not too familiar with JVM internals or Spark, but I know with Cassandra at least there's a cost to off-heap memory. You gain in GC, but eventually you have to move that data in and out of the JVM's heap.

Even for batch processing, long GCs can be bad. It's not just processing the batch that stops, but the whole world. Anything trying to listen for more data, keep track of time elapsed, etc is going to run across more problems. It can also expose some race conditions that would normally be so unlikely that you'd never hit them.

Static JVM memory overhead isn't bad at all, probably even under your 100-200MB guess. The lack of compact primitives adds some additional proportional overhead, but the biggest factor is just the extra "slack" space needed for good GC performance. Depending on the circumstances and requirements that could be 30-400%.