Hacker News new | ask | show | jobs
by WALoeIII 5097 days ago
You can't just put nodes in different regions, even with a database like MongoDB. It will work in theory, in practice you'll have all kinds of latency problems.

WAN replication is a hard problem and glossing over it by waving your hands is a disservice to readers.

"Real" solutions are to run a database that is tolerant of partitioning, and have application level code to resolve the inevitable conflicts. Riak, Cassandra and other Dynamo inspired projects offer this. On the other hand you can use a more consistent store and hide the latency with write-through caching (this is how Facebook does it with memcached + MySQL), but now you have application code that deals with managing this cache.

Either way you have to have very specific application code to handle these scenarios, and you may even run a combination of solutions for different types of data you need to store. There is no silver bullet, there is no framework or product that does it for you.

3 comments

Most of the current MongoDB drivers support routing read queries to the lowest latency replica set member - this solves part of the problem.

Choice of database when planning a project is a more fundamental problem, knowing what to use and why - is the trade off for Riak / Cassandra worth it over MongoDB or even MySQL? This people decide on a per-project basis and of course when starting don't always make the right longer term choice.

Running a multi-region Cassandra cluster is ill advised. Cassandra (and Dynamo databases in general) are quite chatty. I think Netflix has implemented some multi-region clusters. Other companies too I'm sure. But it will certainly give you heaps of new challenges (and heaping bandwidth bills.)
Your information is obsolete. Cassandra will only send a single copy of your updates cross-region, which will then be rereplicated within each region if necessary.
That's certainly good news. Appreciate the correction.
I agree wholeheartedly. HA is hard, a single blog post won't even halfway cover it.
HA isn't actually hard, but it does require some forethought and chosing some technologies which aren't neccesarily cool. However, I would submit that building HA capabilities into a cat photo sharing or microblogging site is sortof overkill, most people don't need it. Just take the hit and move on, people are getting more and more used to sites being down/failing and just retry later. As much as I hate to say it, I do think its fairly accurate.
You pose an interesting question that I want to see answered: do users of B2C sites actually care when the site is down? Does it decrease MAU? Your intuition is that it does not, but I'd love to see some data.
Agreed, this is just a rough guide to it :-)
A really rough guide. Rough enough that I thought of each of these points last night at 1 am while extremely groggy and trying to bring back the 7 or so instances we lost in the outage.

I guess it is a good play to get traffic to your site.