Hacker News new | ask | show | jobs
by darksaints 3996 days ago
Does anybody here have any experience with Go's garbage collection pauses with large stack sizes? I've got a scala app that regularly consumes about 48G of ram, and I'm very happy with the response times during heavy loads like this, but the P99.5 is abysmal because of garbage collection. I've tried tuning it, but it doesn't seem like anything I do helps. I'll probably end up using an Azul JVM but I'm curious how other languages end up handling this problem.
6 comments

this article discusses how they handled 69GB of heap and up to 6 seconds of pause times:

http://blog.golang.org/qihoo

the new garbage collector in 1.5 should improve things.

Heap sizes 32gb+ can't use compressed pointers so don't forget about that bit of overhead. 48gb is just in that annoying zone where you need to go above 48gb heap to get real benefits of 32gb+ effecive heap.

But I only maintain EE apps on at most 10gb heaps, so I'm not hugely experienced with tuning this. All I can recommend is heap size and CMS thresholds (no G1 exp) set so that at steady state you don't end up hitting full GCs.

For a JVM heap size greater than 32GB you should be using G1GC.

Tuning GC for Spark: https://www.youtube.com/watch?v=drmJDISLkf4

In the Big Data space we have dozens of machines all with very large stack sizes (I run mine with 250GB) and don't run into any major stop the world pauses.

> I've got a scala app that regularly consumes about 48G of ram, and I'm very happy with the response times during heavy loads like this, but the P99.5 is abysmal because of garbage collection. I've tried tuning it, but it doesn't seem like anything I do helps.

When tuning a prod Scala deployment a while back, I encountered a nasty "pregnant pause" every once in a while similar to what you have experienced. So, below are what I used in annotated form and adapted for the deployment you describe (the max heap setting). Some of these may be completely obvious to you (or others reading), yet are included for completeness.

  -server
  This one should be obvious :-)
  
  -Xmx54G
  Memory is cheap, so give the JVM 54 gig so that GC
  isn't forced to run when your system is in the
  steady-state of 48G heap utilization.
  
  -XX:PermSize=128m -XX:MaxPermSize=1024M
  Ditto on the cheapness of memory.
  
  -Xss1M
  A stack size of 1 meg seems a bit much, but does
  allow for recursive algorithms to operate with
  impunity.
  
  -XX:ReservedCodeCacheSize=128m
  This one was needed for Scala.  It likely should
  be specified with a high value since it limits the
  JIT's code cache.
  
  -XX:+DoEscapeAnalysis
  A nice way to releave some heap pressure[1].
  
  -XX:+UseCodeCacheFlushing
  Should the ReservedCodeCacheSize be exceeded, this
  lets the JIT continue to do its thing in an LRU
  type of fashion (I believe).
  
  -XX:+UseParallelGC
  This one is the most impactful one of all.  A lot
  of people will say "use UseConcMarkSweepGC!"  They
  are wrong for high volume server deployments.  The
  concurrent mark and sweep algorithm caused massive
  "pregnant pauses" in prod for me!  The Parallel GC
  algorithm performs much better under load and
  doesn't cause the VM to sit-and-spin for 20+
  seconds.
  
  -XX:+UseCondCardMark
  Another tweak which had a major performance boost
  for me[2].
  
  -XX:+UseNUMA
  If your servers are NUMA[3] based, then this can
  significantly increase performance[4] as well.
HTH

1 - http://www.ibm.com/developerworks/java/library/j-jtp09275/in...

2 - https://blogs.oracle.com/dave/entry/false_sharing_induced_by...

3 - https://en.wikipedia.org/wiki/Non-uniform_memory_access

4 - http://jose-manuel.me/2011/06/numa-bb/

EDIT: Inserted newlines to eliminate horizontal scrolling.

Have you tried the new G1 GC? I'm not very familiar with GC tuning, but I thought using the CMS GC with vary large heaps is asking for trouble. Apparently the G1 GC alleviates this, and still manages to be as "cool" as the CMS GC.
> Have you tried the new G1 GC?

IIRC, I did and it didn't benchmark well for my needs. However, it all depends on what JRE you're using and what the system is doing to pick the GC which is best for any given deployment. Classic case of YMMV and all that.

In the end, even though it may sound trite, the only way to know what works best for a given combination of JRE/OS/hardware is to measure it. This article[1] had some good tips and Mission Control[2] is a huge help in this arena.

1 - http://www.infoq.com/articles/Tuning-Java-Servers?utm_source...

2 - http://docs.oracle.com/javacomponents/jmc-5-5/jmc-user-guide...

It's really more about how many objects are on the heap, less about how long the heap is.
What is P99.5?
99.5th percentile -- below that threshold, requests are generally fine, but there's a small fraction of requests that take way too long to go through due to GC kicking in.