|
|
|
|
|
by maypok86
905 days ago
|
|
In fact, lock-free queues have several problems at once, which prompted me to give up on them almost immediately. 1. Yes, S3-FIFO can be implemented using lock-free queues, but the problem is that each write to a filled cache using this design will cause a large number of additional atomic operations not friendly to the processor's cache, while bp-wrapper on the contrary amortizes this load. And reading with frequency update on hot entries can have a bad effect on performance. In many ways this is exactly what the last posts in my discussion with Ben are about (not really about this, but the current problem with otter read speed is caused by a similar problem). https://github.com/Yiling-J/theine-go/issues/29#issuecomment... 2. But the main problem for me is not even that. Lock-free queues work fine as long as you only need to support Get and Set operations, but as soon as you want to add features to your cache, the complexity of the implementation starts to increase, and some features are very hard to add to such a structure. Also, improving the eviction policy is under a big question mark, because not only do you have to think about how to improve the eviction policy, but also how to avoid locks while doing so or how not to slow down the implementation with your improvements. BP-Wrapper has no such problems at all, allows you to use any eviction policy and focus on improving different parts of your cache independently of each other. |
|
1. why does it require multiple atomic operations at each insert? Our implementation in cachelib only used one CAS operation. I will get back to you with an illustration. Maybe you can show me something I missed.
2. I like the bp-wrapper idea. It is intuitive and can be used with most/all algorithms. Batching does reduce/amortize the overhead of locking for LRU-based algorithms. But IIRC, all the promotions are still serialized. Because the promotions are still protected by the lock, and only one core can promote objects at the same time. Caffeine achieves amazing scalability because it drops all the elements that need to be updated/promoted under high contention, which means it effectively becomes a FIFO. I don't think this is the correct way to claim good scalability.
3. "reading with frequency update on hot entries can have a bad effect on performance" I don't understand this. The frequency of hot entries should quickly reach 1 (if using a one-bit counter) or 3 (if using a two-bit counter) and will not be updated anymore. Why would read and update conflict?
4. Yes, implementing get and set with locking is trivial; adding support for delete requires a bit of work (10s lines) but should be manageable. But why would lock-free queues make other features hard to implement? Can you give an example?