| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rdtsc 4117 days ago

Keeping an always consistent state in a large distributed you are fighting against the law of physics.

Like it is mentioned, Google did it with their F1/Spanner SQL database. But that also mean GPS receivers with antennas on the roofs of data centers.

Which is yet another thing that can fail and it, either by itself or in a cascade of other failures will lead to unspecified and possibly undesirable behavior.

Recently I see a lot of advocates of dropping NoSQL databases and moving back to Postgress or other SQL databases.

The problem is SQL and schemas is not the only reason NoSQL databases became popular, they became popular because they also started to have a default and more defined behavior in respect to replication and distribution.

Most solutions don't need that and sticking with a solid single database works very well. But those that need distributed operation have a pretty hard task ahead of them.

One heuristic you can look at is if and how is distribution implemented. Is it something that is bolted on top, like you download some proxy or addon, or added as an afterthought, be very careful. Those things should be baked into the core. For example, does it support CRTDs? Does it have master to master replication and so on. If it claims to have consistency instead availability figure out how it is implemented, Paxos, Raft, or something else.

So far I think Riak is probably the database that thought the hardest and did the most reasonable job. The other simpler database is CouchDB, they have well specified conflict resolution behavior and master to master replication. But you have to usually write your own cluster topology. There are probably others but those are the two I know of first hand.

3 comments

logicallee 4117 days ago

I'm not an expert, and it sounds like you are, so I appreciate your feedback here:

what do you even mean by a consistent state? even in theory a person initiating a new additional record in Auckland, New Zealand at the same time somebody iniatiates a change in Gibraltar or London (which are antipodal to the former[1]) 66 milliseconds away, cannot have a confirmation in less than 120 milliseconds, right? So do you just wait for that before declaring 'consistency'? Do you literally add 120 milliseconds to each and every request? (And this is assuming you have a damned good solution to the two generals problem)? I mean suppose the database tracks something as simple as: number of web page hits. It's a counter. You now distribute it, and have a stochastic process of counter hits between 0 and 5 per second in your largest cities, distributed throughout the world. How can that database ever be consistent?

If there are ten new records per second in New Zealand and ten records per second in the UK, and they potentially depend on each other in some way, are you going to just make everyone wait until everything has been committed and confirmed to be consistent? Or is "a foolish consistency the hobgoblin of little minds", and you really can accept out-of-date data and deal with merging conflicts later?

I just don't understand why we would expect consistency to rank up there, when we deal with a worldwide real-time system where the difference between getting served by a local database in 40 milliesconds and one far away in 250 milliseconds is both staggering, and incredibly noticeable. why be consistent? what is consistence?