Hacker News new | ask | show | jobs
by gregmac 1541 days ago
> For example: instead of memcache/redis, set aside a ~100 MB of memory in your app process for an LRU cache. When an object is requested, hit the DB with an indexed query for just the 'updatedAt' timestamp (should be a sub-10ms query). If it hasn't been modified, return the cached object from memory, otherwise fetch the full object from the DB and update the local cache.

I've never built something with this type of mechanism for a DB query, but it's interesting. I don't think I've ever timed a query like this, but I feel like it's going to be an "it depends" situation based on what fields you're pulling back, if you're using a covering index, just how expensive the index seek operation is, and how frequently data changes. I've mainly always treated it as "avoid round trips to the database" -- zero queries is better than one, and one is better than five.

I also guess it depends on how frequently it's updated: if 100% of the time the timestamp is changed, you might as well just fetch (no caching). Based on all the other variables above, the inflection point where it makes sense to do this is going to change.

Interesting idea though, thanks.

> For bonus points, send an internal invalidation request to any other app instances you have running when an object gets updated. Now you have a fast, scalable, consistent, distributed cache with minimal ops complexity.

Now you have to track what other app servers exist, handle failures/timeouts/etc in the invalidation call, as well as have your app's logic able to work properly if this invalidation doesn't happen for any reason (classic cache invalidation problem). My inclination is at this point you're on the path of replicating a proper cache service anyways, and using Redis/Memcache/whatever would ultimately be simpler.

1 comments

It definitely does depend on various factors, but if your query is indexed, both the SQL DB request and the Redis/Memcache lookup of the full object are likely to be dominated by internal network latency. If your object is large, the DB single-field lookup could easily be faster since you're sending less back over the wire.

In other words, a single-field indexed DB lookup can be treated more like a cache request. Though for heavier/un-indexed queries, your "avoid round trips to the database" advice certainly applies.

With this architecture, the internal invalidation request is just an optimization. It isn't necessary and it doesn't matter if it fails, since you always check the timestamp with a strongly consistent DB read before returning a cached object.