Hacker News new | ask | show | jobs
IdentityCache: Improving Performance one Cached Model at a Time (shopify.com)
35 points by jduff 4823 days ago
2 comments

I'm curious: how does Shopify handle database migrations with the cached models? Do you explicitly invalidate the caches when a migration is run?

The reason I ask is because in the .NET/NHibernate world, this is akin to a 2nd level cache provider. I maintain a Redis caching provider for NHibernate (https://github.com/TheCloudlessSky/NHibernate.Caches.Redis) and I've been trying to find a decent solution to this problem. When NHibernate fetches a cached model with mismatching columns etc, it'll blows up with an exception.

I like the generational approach that is part of Rails 4 but it really only works for the views. Maybe incorporating some sort of generational identity of the model's configuration could be used for cache busting?

The way schema migrations are handled is by making a hash of the current schema part of the cache key, so yeah, effectively every time a cached model's underlying table schema is changed all the cached entries become invalid.
One risk of not using expiration at all is that if the database is updated but the after_commit hook doesn't finish (crash, out of resources etc.) then the cached data remains outdated until the record is updated again (which could be never). Setting a generous TTL won't increase your load much, but will let problems like that eventually fix themselves.
Core dev on IdentityCache here. Excellent point, and truth be told we hadn't considered setting an explicit expiry to let those unavoidable problems fix themselves. We do have a finite amount of space in memcached so the LRU there accomplishes something similar but gaining complete control over the expiry duration does make sense for the reasons you listed. Truth be told however I think I value the flip side of explicit expiry more: we can use any corrupt/un-updated information for debugging which we have done and found really useful in the past. We're also in a way forced to deal with anything which might interrupt our after_commit hooks instead of letting the problem just go away in a day. Hooks firing is also critical for other services we have (like elastic search) that rely on them, and for which I'd rather not create other auto-healers for.

Thanks for the intelligent suggestion!

In some systems the problem is that you can never be guaranteed that the after_commit hook will always run. This is especially true in multi-server systems where the cache, database and front-end servers are separated. The front-end server can update the database and then completely die (power outage, networking, reboot) before talking to the cache.

On the other hand I can see how you would want to drive out bugs instead of just sweeping them under the carpet with auto-healing...

Indeed: it will never be 100%, so perhaps a very, very long TTL on the keys would be wise. We do end up flushing the cache incrementally if we ever ship a bug by accident or notice there is more in the cached blob than we want which I think accomplishes the same thing.