| HN Mirror

It all depends on what time scale we're talking about since "very fast" is relative. High performance native systems don't use (in any meaningful manner) naive malloc/free, so that comparison is somewhat moot. I hear this argument quite often when Java vs C++/C is discussed, but it's not comparing idioms/techniques in actual use.

Also don't forget that when GC runs it trashes your d/i-caches; temporaries/garbage allocs reduce your d-cache efficacy; GC must suspend and resume the java threads, which is trips to the kernel scheduler; there are some pathologies with Java threads reaching/detecting safepoints.

GC store barriers (aka card marking) don't have anything to do with thread contention (apart from one thing, which I'll note later). This is a commonly used technique to record old->young gen references, and serves as a way to reduce the set of roots when doing young GC only (i.e. you don't need to scan the entire heap). So this isn't about thread contention, per say -- with the exception that you can get false sharing due to an implementation detail, such as in Oracle's Hotspot.

The card table is an array of bytes. Each heap object's address can be mapped into a byte in this array. Whenever a reference is assigned, Hotspot needs to mark the byte covering the referrer as dirty. The false sharing comes about when different threads end up executing stores where the objects requiring a mark end up mapping to bytes that are on the same cacheline - fairly nasty if you hit this problem as it's completely opaque. So Hotspot has a XX:+UseCondCardMark flag that tries to neutralize this by first checking if the card is already dirty, and if so, skips the mark; as you can imagine, this inserts an extra compare and branch into the existing card marking code - no free lunch.