Hacker News new | ask | show | jobs
by Udo 5344 days ago
> During the last weeks we were more focused on reducing RAM consumption of our databases as that is currently the main driver of cost and operation overhead. <

I can imagine that.

> Regarding your idea: Wouldn't then the Node.js server have to keep the whole user state in memory? <

Yes, but I think it would have several advantages:

(1) The Node.js server code could decide which working sets it keeps in memory based on very simple rules. The details of this would be abstracted away from the application code itself, because the app just issues read and write requests on a user's dataset. So in essence, by splitting up the problem in two, it becomes relatively easy to handle (and optimize) on each end.

(2) You just have to keep the active datasets wired in RAM and it wouldn't be necessary for the Node server to know whether a user has disconnected recently or not. All it knows is when the data was last accessed and it can then vacate RAM slots that have become stale. Compare this to Redis, which I believe just keeps everything in memory no matter what. So overall RAM usage would probably be considerably less than what you're doing now.

(3) The idea beats "dumb blob caching" such as memcache, because it makes small operations economical. It seems to me that Node is well suited for this kind of task since it makes it very easy to build small server scripts that handle a huge number of small transactions. This would probably mean you need less machines for the same amount of users.

(4) I believe it's relatively easy to implement replication and scaling.

Anyway, just an idea. I have no clue whether this works in practice ;-)

1 comments

Yes, this might work. But I would be careful about replication and scaling - this could make things somewhat complicated. ;-)
Not necessarily:

Replication is basically just a provision for instant failover. Let's say that by policy the background data store (e.g. MySQL) always has a copy that is at most 10 minutes old. In practice it could probably be much more recent. So in general user data is safe but you want something very simple to prevent data loss and service disruption in the most common failure scenarios.

I believe the best paradigm is a replication buddy system between two given Nodes. Should a Node instance fail, the app can always issue the same request to its "replication buddy" and expect to get the same data. Implementing a replication buddy relationship between two instances should be relatively easy using a persistent connection between them, since Node is all non-blocking but still guaranteed sequentially executed code (=there will be no real consistency problem). Nodes could just notify each other when data changes in the background and they'd both always have the same data state. Granted, there would have to be some code to take care of what happens in different failure modes (probably the most complex aspect of the whole thing), but overall still very doable.

Scaling would be even easier: just put user IDs into different buckets, each bucket is a replicated instance. If this is even necessary.

And the beauty of it is that you have to implement this just once, no matter how many different server-side apps and languages you use. It would be a common piece of infrastructure.