Hacker News new | ask | show | jobs
by robbles 4569 days ago
What's the algorithm you use for consolidating those repeated writes? I've thought about doing something like this in the past, but was put off by the worry that there might be edge cases where data would be lost (e.g. insufficient writes to trigger a write through, many concurrent writes, etc.)
1 comments

So I actually store the rows as serialized JSON objects in my redis cache, and when a write occurs, I update the JSON representation in the cache (if it's not in the cache, I read it from the DB and store it there). After X hours, I have a daemon that goes through all expired keys, deserializes the JSON, and executes a transaction, updating 500 objects at a time. To prevent possible race conditions, you need a mutex to allow only 1 thread to modify the cache for that row.

Each object has a field that keeps track of what fields were updated, so I can construct an update query from that.

When I read a row from the DB, I first check if it's in the write through cache, if so I retrieve it there.

Of course, this means that all writes go through my cache. There can't be any other process that updates that table that avoids writing to the cache, or else we have database inconsistency.

This improved the load in my DB server by more than ten-fold, as well as the indexing time to my ElasticSearch server.

Sounds pretty effective to me. Thanks for sharing this.

Possibly stupid question - how do you retrieve all the expired keys? I was under the impression that once keys expired, they were effectively invisible to clients?

So I actually set a soft time to expire as a value for the key. My daemon reads all objects < this time, and deletes those keys itself. The hard time to expire is set to a time much further than this to prevent the scenario of the daemon exiting for some reason, and memory overflowing.