| The magic of RCU definitely is the safety guarantee on read-side with little to no overhead and an easy to approach API. Traditional copy-on-write systems do not provide this guarantee as there would be synchronization and / or blocking on the fast path (barriers / atomic operations / locks). Cores versus threads is a detail of the execution environment which may affect RCU implementation but not the actual magic. For example, a lot of work was necessary to get RCU to scale to many hundreds of cores in the kernel (http://lwn.net/Articles/305782/ is one example). For ck_epoch, we had to implement a proxy collection-like pattern for workloads involving thousands of concurrent events that required longer-lived hazardous references (thanks to John Esmet for this work)...this will hit upstream as ck_epc and is sitting in a pull request on the github page (http://github.com/concurrencykit/ck). re:Efficiency, magnitudes more efficient on read-side in all ways (including throughput and latency) assuming that delete frequency is not too heavy and that you can afford write-side blocking. Consider that the read-side has few to none atomic operations on the fast path, and if so, is typically on exclusively written memory and it does not block in any fashion. This means readers scale and are very fast. You might enjoy the paper I linked to which exposes some of the performance characteristics (it isn't exhaustive, it doesn't explore real-world implications on memory usage as update frequency increases, for example, but it's a great start). Paul also has some refreshed performance analysis in his ACM Queue article. If you couple this with specialized concurrently readable data structures (like http://backtrace.io/blog/blog/2015/03/13/workload-specializa...), you get super sexy performance for readers. You might also be interested in "passive reader-writer locks", which provide similar costs on the fast path to RCU for TSO architectures while still providing read-write lock semantics...but at the cost of exceptionally more expensive write-side synchronization (and user-space support requires some kernel modifications). |
More efficient than what? RCU with a general-purpose GC? I thought that's what we were talking about.
In our in-memory database (written in Java), we use what you call RCU (I had no idea any copy-on-write without any synchronization on the read side is called RCU) where we can for read-mostly data, but the problem is that in some data structures, nodes may be referenced by two references (or more), so since we don't have DCAS, we can't do the atomic replacement of the node (in some circumstances we can forgo the transactional update of on one of the references, so we can do RCU). Another problem is that some writes must block reads to ensure transaction isolation. Instead, to reduce the cost of reads updating a read-write lock, we use optimistic locking that also ensures a reader requires no memory writes, but may retry due to a concurrent writer.