|
|
|
|
|
by pron
4026 days ago
|
|
> The efficiency stems from having little to zero synchronization cost for readers in a read-mostly workload or workload that just isn't update-intensive. That's the advantage of any copy-on-write concurrency mechanism. The "magic" of RCU (unless you define any copy-on-write algorithm as RCU) is the indifference of the algorithm complexity to the number of threads involved -- only the number of cores. I am not aware of userspace algorithms that display this property, but I may be wrong about that. > All the blocking schemes are magnitudes more efficient but the trade-off is that it isn't generally applicable. But that's where things get tricky. Magnitudes more efficient how? Total throughput? Maximum latency? I can certainly believe the latter but I doubt the former. Hazard-pointer techniques basically are how modern GCs work, with the difference of when the scans are triggered. |
|
Cores versus threads is a detail of the execution environment which may affect RCU implementation but not the actual magic. For example, a lot of work was necessary to get RCU to scale to many hundreds of cores in the kernel (http://lwn.net/Articles/305782/ is one example).
For ck_epoch, we had to implement a proxy collection-like pattern for workloads involving thousands of concurrent events that required longer-lived hazardous references (thanks to John Esmet for this work)...this will hit upstream as ck_epc and is sitting in a pull request on the github page (http://github.com/concurrencykit/ck).
re:Efficiency, magnitudes more efficient on read-side in all ways (including throughput and latency) assuming that delete frequency is not too heavy and that you can afford write-side blocking. Consider that the read-side has few to none atomic operations on the fast path, and if so, is typically on exclusively written memory and it does not block in any fashion. This means readers scale and are very fast. You might enjoy the paper I linked to which exposes some of the performance characteristics (it isn't exhaustive, it doesn't explore real-world implications on memory usage as update frequency increases, for example, but it's a great start). Paul also has some refreshed performance analysis in his ACM Queue article. If you couple this with specialized concurrently readable data structures (like http://backtrace.io/blog/blog/2015/03/13/workload-specializa...), you get super sexy performance for readers.
You might also be interested in "passive reader-writer locks", which provide similar costs on the fast path to RCU for TSO architectures while still providing read-write lock semantics...but at the cost of exceptionally more expensive write-side synchronization (and user-space support requires some kernel modifications).