Hacker News new | ask | show | jobs
by bjourne 4080 days ago
Yes it does affect the duration of them. The larger nursery you have, the more gc roots you will need to trace due to write barriers from older generations. So yes you are right that the duration is dependent on the number of live objects at gc time, but the number of live objects is also dependent on the size of the nursery.

(and whoever is down-voting me, maybe you can explain why I'm wrong?)

2 comments

My understanding is that an entry in the card table is set when an object in young generation is allocated that is referenced by something in the old generation. An entry in the card table corresponds to a 512 byte segment of memory in the old generation. Thus, the cost imposed by this would be based on how many distinct 512 byte segments of the old generation reference any objects in the young generation.

If you have a web service that mostly consists of some baseline of long-lived objects and many short-lived objects used for fulfilling requests, I would expect to have relatively few GC roots. At that point, if you assume that you have a consistent request rate, I would expect the number of reachable objects in the young generation to remain constant regardless of the size of the young generation, and the number of GC roots should also remain constant. Based on that, increasing the young generation size would then decrease the frequency of young generation garbage collection, reduce the probability of survivors getting promoted to old generation, and have no effect on the time it takes to do young generation garbage collection. There certainly applications that have different behavior when the old generation is less static, but I would think for this use case the new generation size should be as big as it can be.

If something I've said is incorrect or incomplete, I'm anxious to know. There are relatively few well-written explanations of how Java garbage collection works, so it is difficult to have confidence regarding it without a lot of practical experience as you have said.

That's a good explanation of why I'm wrong. Basically you are hoping to reach an equilibrium situation in which 0% of the allocated memory in nursery are true survivors. Because if the true survival rate was higher than 0%, then the larger the nursery size the longer the duration between collections and the higher the number of objects that are true survivors.

If you had a perfect situation like that, with a giant nursery, you wouldn't even need to gc anything. When the nursery is full, just start over from address 0 and you can be confident that when the new objects starts overwriting the old that the old will already be unreachable from the object graph.

You never reach that situation in reality. Even in a simple web server some request handling thread might do something innocuous like setting a key in a cache hash somewhere leading to the hash being full and needing to be reallocated. That would dirty mark one card. And again, the longer the duration, the more of these "freak" events you get. Or there may be a string somewhere that keeps track of the current date and when it ticks over from "July 31st, 2015" to "August 1st, 2015" it triggers a reallocation because the last string is one character longer.

It may be that having a large nursery is a good trade-off because for many loads it's the same cards being marked over and over again. That may outweigh the increased frequency of tenured generation collections (memory isn't free so you must take the space from somewhere).

Not an expert but my experience/basic understanding

The old generation gets at lot more expensive as it gets bigger, and I think requires at least some stop the world with all the collectors in hotspot.

New generation collections often remain quick as the size grows as long as most objects are dying young. Increasing the size of new also gives more opportunity for objects to die before being promoted (if you have lots of objects that live just long enough to be promoted it can be a good strategy to increase size of new). New can be collected concurrently.