| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by NovaX 1892 days ago

> However, if this design is used as a local cache over shared memory for a high throughput system (trading? ML feature store?) the scalability can be handy.

In that case, one might be concerned about the hit rates. While FIFO & LRU have been shown to work very well for a remote cache, especially in social network workloads, it is a poor choice in many other cases. Database, search, and analytical workloads are LFU & MRU biased due to record scans. I'd be concerned that Segcache's design is not general purpose enough and relies too heavily on optimizations that work for Twitter's use cases.

Unfortunately as applications have dozens of local caches, they are rarely analyzed and tuned. Instead implementations have to monitor the workload and adapt. Popular local caches can concurrently handle 300M+ reads/s, use an adaptive eviction policy, and leverage O(1) proactive expiration. As they are much smaller, there is less emphasis on minimizing metadata overhead and more on system performance, e.g. using memoization to avoid SerDe roundtrips to the remote cache store. See for example Expedia using a local cache to reduce their db reads by 50% which allowed them to remove servers, hit SLAs, and absorb spikes (at a cost of ~500mb) [1].

[1] https://medium.com/expedia-group-tech/latency-improvement-wi...

1 comments

1a1a11a 1892 days ago

I might be dumb about estimating throughput. According to https://github.com/Cyan4973/xxHash, the best hash function can only do 100s M hashes per second, how can a local cache run at such throughput? I assume when measuring cache throughput, one need to calculate hash, look up, (maybe compare keys), and copy the data.

link

NovaX 1892 days ago

Are you comparing a single threaded hash benchmark to a multithreaded cache benchmark? An unbounded concurrent hash table has 1B reads/s on a 16 core machine (~60M ops/s per thread)

link

1a1a11a 1891 days ago

Oh, my bad, thank you for pointing out! BTW, just for my curiosity, where is the 1B reads/s benchmark?

link

NovaX 1890 days ago

A multi-threaded benchmark of a cache should be fully populated and use a scrambled Zipfian distribution. This emulates hot/cold entries and highlights the areas of contention (locks, CASes, etc). A lock-free read benefits thanks to cpu cache efficiency causing super linear growth.

This shows if the implementation could be a bottleneck and scales well enough, after which the hit rate and other factors are more important than raw throughput. I would rather sacrifice a few nanos on a read than suffer much lower hit rates or have long pauses on a write due to eviction inefficiencies.

[1] https://github.com/ben-manes/caffeine/wiki/Benchmarks#read-1...

[2] https://github.com/ben-manes/caffeine/blob/master/caffeine/s...

link