Hacker News new | ask | show | jobs
by annowiki 886 days ago
This is not really "sieve-ing" per the article, but what prevents me from running another process that periodically queries the data in a cache? Like just running a celery queue in Python that continually checks the cache for out of date information constantly updating it? Is there a word for this? Is this a common technique?
4 comments

I think this is not as simple, because to achieve good metrics (latency, cache hit) you will need to be predicting the actual incoming query load, which is quite hard. Letting the query load itself set the values is the state of the art.

In some ways, caching can be seen a prediction problem. And the cache hit rate is the error as we lag the previous history at time T. Blending load over time is effectively what these various cache algorithms do to avoid overfitting.

If you have an idea of what you need to cache or can fit everything into the cache it's extremely effective.

Tho potentially just refreshing out of date data in the cache could increase effectiveness given that general assumption of the cache is whats in the cache will probably be used again.

I called it a periodically refreshing cache when I wrote one. Not sure if there is a more formal name.

You might call that prefetching. That's what unbound calls it when it returns a near expired cached entry to a client and also contacts upstream to update its cache. I remember having a similar option in squid, but it might have been only in an employer's branch (there were a lot of nice extensions that unfortunately didn't make it upstream)
You're describing cache maintenance (and cache eviction), a practice for which there are many algorithms (FIFO, LRU, LFU, etc.), including the algorithm the article describes (SIEVE)
I think this is orthogonal to cache maintenance and cache eviction. Instead this is having a background process periodically refreshing the data in the cache to keep it hot.
Refreshing the cache to keep it hot, and deciding how to do it, e.g. which parts do we do with the caching layer directly, which parts do we do with an external process, what to evict to make room, etc, are subtopics of cache management.

If I understand you correctly, you're asking if this is different because an external process is involved. I don't see a use in drawing a distinction, and as far as I know, there's no special term for that pattern.

Update: after looking into it, it looks like this cache/architecture pattern is called "refresh-ahead"