Hacker News new | ask | show | jobs
by AznHisoka 4569 days ago
I have been using redis as a write though cache(not just a read cache). So I used to have 1000 concurrent writes to my Postgres db. Obviously this had a negative impact on performance even with modern specs(64gb ram, 8 cores). I realized that a lot of those writes we're updating the same column for the same row but in different threads. So I basically use redis to buffer these queries and perform 1 single write instead of multiple writes for the same row.
1 comments

What's the algorithm you use for consolidating those repeated writes? I've thought about doing something like this in the past, but was put off by the worry that there might be edge cases where data would be lost (e.g. insufficient writes to trigger a write through, many concurrent writes, etc.)
So I actually store the rows as serialized JSON objects in my redis cache, and when a write occurs, I update the JSON representation in the cache (if it's not in the cache, I read it from the DB and store it there). After X hours, I have a daemon that goes through all expired keys, deserializes the JSON, and executes a transaction, updating 500 objects at a time. To prevent possible race conditions, you need a mutex to allow only 1 thread to modify the cache for that row.

Each object has a field that keeps track of what fields were updated, so I can construct an update query from that.

When I read a row from the DB, I first check if it's in the write through cache, if so I retrieve it there.

Of course, this means that all writes go through my cache. There can't be any other process that updates that table that avoids writing to the cache, or else we have database inconsistency.

This improved the load in my DB server by more than ten-fold, as well as the indexing time to my ElasticSearch server.

Sounds pretty effective to me. Thanks for sharing this.

Possibly stupid question - how do you retrieve all the expired keys? I was under the impression that once keys expired, they were effectively invisible to clients?

So I actually set a soft time to expire as a value for the key. My daemon reads all objects < this time, and deletes those keys itself. The hard time to expire is set to a time much further than this to prevent the scenario of the daemon exiting for some reason, and memory overflowing.