| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by firefly2000 111 days ago
	Are there plans to elucidate implicit GC costs as well?

1 comments

jonasn 111 days ago

Great question! I actually just touched on this in another thread that went up right around the same time you asked this. It is clearly the next big frontier!

The short answer is: It's something I'm actively thinking about, but instrumenting micro-level events (like ZGC's load barriers or G1's write barriers) directly inside application threads without destroying throughput (or creating observer effects invalidating the measurements) is incredibly difficult.

link

magicalhippo 110 days ago

> instrumenting micro-level events (like ZGC's load barriers or G1's write barriers) directly inside application threads without destroying throughput (or creating observer effects invalidating the measurements) is incredibly difficult

I've used a sampling profiler with success to find lock contention in heavily multithreaded code, but I guess there are some details that makes it not viable for this?

link

firefly2000 111 days ago

Do you think it can be done by adjusting GC aggressiveness (or even disabling it for short periods of time) and correlating it with execution time?

link

jonasn 111 days ago

That is spot on. Effectively disabling GC to establish a baseline is exactly the methodology used in the Blackburn & Hosking paper [1] I referenced.

In general, for a production JVM like HotSpot, the implicit cost comes largely from the barriers (instructions baked directly into the application code). So even if we disable GC cycles, those barriers are still executing.

If we were to remove barriers during execution, maintaining correctness becomes the bottleneck. We would need a way to ensure we don't mark a live (reachable) object as dead the moment we re-enable the collector.

[1] https://dl.acm.org/doi/pdf/10.1145/1029873.1029891

link

babol 110 days ago

Would running an application with chosen GC, subtracting GC time reported by methods You introduced, and then comparing with Epsilong-based run be a good estimate of barrier overhead ?

Thank you for the well written article!

link

jonasn 110 days ago

That is a creative idea, but unfortunately, Epsilon changes the execution profile too much to act as a clean baseline for barrier costs.

One huge issue is spatial locality. Epsilon never reclaims, whereas other GCs reclaim and reuse memory blocks. This means their L2/L3 cache hit rates will be fundamentally different.

If you compare them, the delta wouldn't just be the barrier overhead; it would be the barrier overhead mixed with completely different CPU cache behaviors, memory layout etc. The GC is a complex feedback loop, so results from Epsilon are rarely directly transferable to a "real" system.

link