Hacker News new | ask | show | jobs
by AdieuToLogic 3999 days ago
> I've got a scala app that regularly consumes about 48G of ram, and I'm very happy with the response times during heavy loads like this, but the P99.5 is abysmal because of garbage collection. I've tried tuning it, but it doesn't seem like anything I do helps.

When tuning a prod Scala deployment a while back, I encountered a nasty "pregnant pause" every once in a while similar to what you have experienced. So, below are what I used in annotated form and adapted for the deployment you describe (the max heap setting). Some of these may be completely obvious to you (or others reading), yet are included for completeness.

  -server
  This one should be obvious :-)
  
  -Xmx54G
  Memory is cheap, so give the JVM 54 gig so that GC
  isn't forced to run when your system is in the
  steady-state of 48G heap utilization.
  
  -XX:PermSize=128m -XX:MaxPermSize=1024M
  Ditto on the cheapness of memory.
  
  -Xss1M
  A stack size of 1 meg seems a bit much, but does
  allow for recursive algorithms to operate with
  impunity.
  
  -XX:ReservedCodeCacheSize=128m
  This one was needed for Scala.  It likely should
  be specified with a high value since it limits the
  JIT's code cache.
  
  -XX:+DoEscapeAnalysis
  A nice way to releave some heap pressure[1].
  
  -XX:+UseCodeCacheFlushing
  Should the ReservedCodeCacheSize be exceeded, this
  lets the JIT continue to do its thing in an LRU
  type of fashion (I believe).
  
  -XX:+UseParallelGC
  This one is the most impactful one of all.  A lot
  of people will say "use UseConcMarkSweepGC!"  They
  are wrong for high volume server deployments.  The
  concurrent mark and sweep algorithm caused massive
  "pregnant pauses" in prod for me!  The Parallel GC
  algorithm performs much better under load and
  doesn't cause the VM to sit-and-spin for 20+
  seconds.
  
  -XX:+UseCondCardMark
  Another tweak which had a major performance boost
  for me[2].
  
  -XX:+UseNUMA
  If your servers are NUMA[3] based, then this can
  significantly increase performance[4] as well.
HTH

1 - http://www.ibm.com/developerworks/java/library/j-jtp09275/in...

2 - https://blogs.oracle.com/dave/entry/false_sharing_induced_by...

3 - https://en.wikipedia.org/wiki/Non-uniform_memory_access

4 - http://jose-manuel.me/2011/06/numa-bb/

EDIT: Inserted newlines to eliminate horizontal scrolling.

1 comments

Have you tried the new G1 GC? I'm not very familiar with GC tuning, but I thought using the CMS GC with vary large heaps is asking for trouble. Apparently the G1 GC alleviates this, and still manages to be as "cool" as the CMS GC.
> Have you tried the new G1 GC?

IIRC, I did and it didn't benchmark well for my needs. However, it all depends on what JRE you're using and what the system is doing to pick the GC which is best for any given deployment. Classic case of YMMV and all that.

In the end, even though it may sound trite, the only way to know what works best for a given combination of JRE/OS/hardware is to measure it. This article[1] had some good tips and Mission Control[2] is a huge help in this arena.

1 - http://www.infoq.com/articles/Tuning-Java-Servers?utm_source...

2 - http://docs.oracle.com/javacomponents/jmc-5-5/jmc-user-guide...