Hacker News new | ask | show | jobs
by slobdell 3549 days ago
thanks for posting...I will try the implementation you linked to and see the effects on performance.

Channels end up locking under the hood as well, so from a purity standpoint I wanted to avoid those...if channels are indeed faster, then clearly I need to rework my approach.

3 comments

Locking isn't the problem. Contention is the problem.

Even "lock free" algorithms can still not scale due to contention. Read this paper for a good summary:

http://queue.acm.org/detail.cfm?id=2991130

In short, a queue is made up of a linked list, where each entry is itself a circular buffer. Insertions go into a circular buffer... unless there's contention, in which case the next circular buffer is used.

The result is that insertions are mostly not contented. Removals are mostly not contented. And it scales very well.

If you want to push the performance of that approach (a single goroutine prioritising your messages for you a queue in and a queue out) I have built a set of lock-free queues on a ringbuffer.

https://github.com/fmstephe/flib/tree/master/queues/spscq

Those are only single producer single consumer (spsc) so not flexible enough for your needs right now, but I am working on expanding into multiproducer/multiconsumer variants which could be a good fit.

The single producer/consumer queues get up to 100 million messages per second (on microbenchmarks) and I expect the multi variants to be above 10-20 million per second.

I also have a handbuilt redblack tree which does around 10 million inserts/pops per second

https://github.com/fmstephe/matching_engine/tree/master/matc...

Although that is very specialised for another purpose, I would be happy to try stripping it down if you wanted that. But, start with the LLRB in that gist.

My experience is that you can lock and unlock a golang mutex in around 130ns, do a cas in around 60ns, and a fetch and increment in around 30ns. So building something atop the atomic primitives is potentially faster, but the difference is not so dramatic that you shouldn't try a simple implementation first.
Lock free structures typically have worse average or even uniformly worse performance. The issue is what happens under contention or predictability. Lock freedom guarantees that at least one process makes progress whereas with locks a process can acquire a lock and get switched between cores or get GC paused or whatever.