Hacker News new | ask | show | jobs
by stcredzero 5282 days ago
What are the long term implications of the increasing amount of parallel computing resources available to programmers? Are we getting to a point where there is enough excess CPU available, that the extra instruction on every assignment for reference counting is no big deal? (In certain contexts. In some contexts extra instructions are always a big deal, but these don't span all of computing.) Combine that with incremental algorithms for cycle reclamation, and you'd have great low-latency GC.
2 comments

It's still not trivial even with massive parallelism. Performance is easily killed if you:

1) Completely saturate the memory interconnect between the processors and RAM chips

2) If your parallel collector threads are touching the same memory that your worker threads are, you're going to have some cache contention

3) One of the big challenges with cache lines that have writes in them is if they are shared between processors. For example, create two ref cells and have two parallel threads set them -- if the compiler did not happen to pad them out to cache line size, they'll be fighting for the same 64 bytes (8 64-bit words/pointers) of memory

We (Manticore) have done a lot of work to isolate the per-CPU work, heap pages, etc. (http://arxiv.org/abs/1105.2554 , MSPC 2011 proceedings should be out on acm.org soon-ish). But it's not easy and requires rearchitecting your whole compiler and runtime system. GHC is also in the middle of some similar work, and they have a much harder problem both due to the challenges of implementing laziness and they have a full-featured system and real user base, so they can't just change things willy-nilly.

Further, once you get the parallelism right, you end up in a different situation: Amdahl's Law bites. Even if 90% of your program is amenable to parallel work, if you make that portion go infinitely fast, you still have a sequential portion that takes 10% of the time and therefore limits you to a maximum of 10x speedup.

We are hitting that limitation right now in our research and going back to do more sequential performance work. I've got a small army of undergraduates implementing compiler optimizations and analyses :-)

The the object can be accessed (not modified, just accessed) from multiple threads, the extra instruction needs to be a thread safe atomic increment, which is hugely expensive, and becomes more expensive the more cores you have, especially in a NUMA architecture.

Also, when an object is freed, you need to decrement the reference count of every single object it points to, even objects that are referenced from many other places and clearly won't be reclaimed for a long time.