Hacker News new | ask | show | jobs
by AznHisoka 4569 days ago
So I actually store the rows as serialized JSON objects in my redis cache, and when a write occurs, I update the JSON representation in the cache (if it's not in the cache, I read it from the DB and store it there). After X hours, I have a daemon that goes through all expired keys, deserializes the JSON, and executes a transaction, updating 500 objects at a time. To prevent possible race conditions, you need a mutex to allow only 1 thread to modify the cache for that row.

Each object has a field that keeps track of what fields were updated, so I can construct an update query from that.

When I read a row from the DB, I first check if it's in the write through cache, if so I retrieve it there.

Of course, this means that all writes go through my cache. There can't be any other process that updates that table that avoids writing to the cache, or else we have database inconsistency.

This improved the load in my DB server by more than ten-fold, as well as the indexing time to my ElasticSearch server.

1 comments

Sounds pretty effective to me. Thanks for sharing this.

Possibly stupid question - how do you retrieve all the expired keys? I was under the impression that once keys expired, they were effectively invisible to clients?

So I actually set a soft time to expire as a value for the key. My daemon reads all objects < this time, and deletes those keys itself. The hard time to expire is set to a time much further than this to prevent the scenario of the daemon exiting for some reason, and memory overflowing.