Hacker News new | ask | show | jobs
by legoviking 4080 days ago
My understanding is that an entry in the card table is set when an object in young generation is allocated that is referenced by something in the old generation. An entry in the card table corresponds to a 512 byte segment of memory in the old generation. Thus, the cost imposed by this would be based on how many distinct 512 byte segments of the old generation reference any objects in the young generation.

If you have a web service that mostly consists of some baseline of long-lived objects and many short-lived objects used for fulfilling requests, I would expect to have relatively few GC roots. At that point, if you assume that you have a consistent request rate, I would expect the number of reachable objects in the young generation to remain constant regardless of the size of the young generation, and the number of GC roots should also remain constant. Based on that, increasing the young generation size would then decrease the frequency of young generation garbage collection, reduce the probability of survivors getting promoted to old generation, and have no effect on the time it takes to do young generation garbage collection. There certainly applications that have different behavior when the old generation is less static, but I would think for this use case the new generation size should be as big as it can be.

If something I've said is incorrect or incomplete, I'm anxious to know. There are relatively few well-written explanations of how Java garbage collection works, so it is difficult to have confidence regarding it without a lot of practical experience as you have said.

1 comments

That's a good explanation of why I'm wrong. Basically you are hoping to reach an equilibrium situation in which 0% of the allocated memory in nursery are true survivors. Because if the true survival rate was higher than 0%, then the larger the nursery size the longer the duration between collections and the higher the number of objects that are true survivors.

If you had a perfect situation like that, with a giant nursery, you wouldn't even need to gc anything. When the nursery is full, just start over from address 0 and you can be confident that when the new objects starts overwriting the old that the old will already be unreachable from the object graph.

You never reach that situation in reality. Even in a simple web server some request handling thread might do something innocuous like setting a key in a cache hash somewhere leading to the hash being full and needing to be reallocated. That would dirty mark one card. And again, the longer the duration, the more of these "freak" events you get. Or there may be a string somewhere that keeps track of the current date and when it ticks over from "July 31st, 2015" to "August 1st, 2015" it triggers a reallocation because the last string is one character longer.

It may be that having a large nursery is a good trade-off because for many loads it's the same cards being marked over and over again. That may outweigh the increased frequency of tenured generation collections (memory isn't free so you must take the space from somewhere).