| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by maypok86 905 days ago

In fact, lock-free queues have several problems at once, which prompted me to give up on them almost immediately.

1. Yes, S3-FIFO can be implemented using lock-free queues, but the problem is that each write to a filled cache using this design will cause a large number of additional atomic operations not friendly to the processor's cache, while bp-wrapper on the contrary amortizes this load. And reading with frequency update on hot entries can have a bad effect on performance. In many ways this is exactly what the last posts in my discussion with Ben are about (not really about this, but the current problem with otter read speed is caused by a similar problem). https://github.com/Yiling-J/theine-go/issues/29#issuecomment...

2. But the main problem for me is not even that. Lock-free queues work fine as long as you only need to support Get and Set operations, but as soon as you want to add features to your cache, the complexity of the implementation starts to increase, and some features are very hard to add to such a structure. Also, improving the eviction policy is under a big question mark, because not only do you have to think about how to improve the eviction policy, but also how to avoid locks while doing so or how not to slow down the implementation with your improvements. BP-Wrapper has no such problems at all, allows you to use any eviction policy and focus on improving different parts of your cache independently of each other.

1 comments

1a1a11a 905 days ago

I like the discussions with you. I have a few comments and questions.

1. why does it require multiple atomic operations at each insert? Our implementation in cachelib only used one CAS operation. I will get back to you with an illustration. Maybe you can show me something I missed.

2. I like the bp-wrapper idea. It is intuitive and can be used with most/all algorithms. Batching does reduce/amortize the overhead of locking for LRU-based algorithms. But IIRC, all the promotions are still serialized. Because the promotions are still protected by the lock, and only one core can promote objects at the same time. Caffeine achieves amazing scalability because it drops all the elements that need to be updated/promoted under high contention, which means it effectively becomes a FIFO. I don't think this is the correct way to claim good scalability.

3. "reading with frequency update on hot entries can have a bad effect on performance" I don't understand this. The frequency of hot entries should quickly reach 1 (if using a one-bit counter) or 3 (if using a two-bit counter) and will not be updated anymore. Why would read and update conflict?

4. Yes, implementing get and set with locking is trivial; adding support for delete requires a bit of work (10s lines) but should be manageable. But why would lock-free queues make other features hard to implement? Can you give an example?

link

maypok86 903 days ago

1. Here I don't understand how you can do with one cas on a filled cache. You need to evict items on every insert, reinsert items into the main queue, etc., and if you add support for item weights, the number of xadd and cas operations will be even greater.

2. It's a bit non-obvious here. Caffeine does not lose a single insert, update, delete operation and moreover, keeps track of their correct execution with fantastic scrupulosity. Also the loss of some reads is inherent in many eviction policies. And caffeine's lossy buffers also have a great ability: the number of these buffers is automatically adjusted at runtime based on contention, which minimizes losses.

3. I specifically mentioned here that it's not the same, but it is possible to design a load that will force as many xadd operations as possible and as few load operations as possible. But in principle you can ignore this point, as it is rather artificial.

4. Here I rather mean that all features should live in the lock-free model or somehow be separately integrated into the architecture (which only complicates things). Plus, for example, a person wants to modify the eviction policy a bit, for example, by adding the lfu part to it and then it's not clear how to do it.

link

1a1a11a 902 days ago

1. I wrote a bit on how to implement lock-free FIFO queues here https://blog.jasony.me/system/cache/2023/12/28/fifo, let me know if any part is not clear or not correct. Moving an object from the small queue to the main queue can be viewed as two operations: evicting the item from the small queue and inserting it into the main queue.

2. I am not criticizing the design decision in Caffeine. It is good because most people just don't need crazy scalability (>16 threads) or do not run it at such high throughput. I understand that critical operations are not lost. But the promotions are lost under contention, which makes comparison unfair. S3-FIFO is still the same algorithm under contention, but The W-TinyLFU in Caffeine becomes FIFO under high contention. I do not think that it can claim a low miss ratio and good scalability at the same time. The relationship is rather a low miss ratio OR good scalability. But as I mentioned, it does not matter for most use cases, and the design is elegant.

3. skipped

4. It is true that if one wants to port a lock-based new algorithm, it would require extra effort. But I guess this is a decision you have to make: only support lock-free eviction algorithms (FIFO-based) or support more algorithms with locks. But I do want to mention that S3-FIFO is not the only lock-free algorithm and there are and will be more.

link