Hacker News new | ask | show | jobs
by rdtsc 4117 days ago
Keeping an always consistent state in a large distributed you are fighting against the law of physics.

Like it is mentioned, Google did it with their F1/Spanner SQL database. But that also mean GPS receivers with antennas on the roofs of data centers.

Which is yet another thing that can fail and it, either by itself or in a cascade of other failures will lead to unspecified and possibly undesirable behavior.

Recently I see a lot of advocates of dropping NoSQL databases and moving back to Postgress or other SQL databases.

The problem is SQL and schemas is not the only reason NoSQL databases became popular, they became popular because they also started to have a default and more defined behavior in respect to replication and distribution.

Most solutions don't need that and sticking with a solid single database works very well. But those that need distributed operation have a pretty hard task ahead of them.

One heuristic you can look at is if and how is distribution implemented. Is it something that is bolted on top, like you download some proxy or addon, or added as an afterthought, be very careful. Those things should be baked into the core. For example, does it support CRTDs? Does it have master to master replication and so on. If it claims to have consistency instead availability figure out how it is implemented, Paxos, Raft, or something else.

So far I think Riak is probably the database that thought the hardest and did the most reasonable job. The other simpler database is CouchDB, they have well specified conflict resolution behavior and master to master replication. But you have to usually write your own cluster topology. There are probably others but those are the two I know of first hand.

3 comments

I'm not an expert, and it sounds like you are, so I appreciate your feedback here:

what do you even mean by a consistent state? even in theory a person initiating a new additional record in Auckland, New Zealand at the same time somebody iniatiates a change in Gibraltar or London (which are antipodal to the former[1]) 66 milliseconds away, cannot have a confirmation in less than 120 milliseconds, right? So do you just wait for that before declaring 'consistency'? Do you literally add 120 milliseconds to each and every request? (And this is assuming you have a damned good solution to the two generals problem)? I mean suppose the database tracks something as simple as: number of web page hits. It's a counter. You now distribute it, and have a stochastic process of counter hits between 0 and 5 per second in your largest cities, distributed throughout the world. How can that database ever be consistent?

If there are ten new records per second in New Zealand and ten records per second in the UK, and they potentially depend on each other in some way, are you going to just make everyone wait until everything has been committed and confirmed to be consistent? Or is "a foolish consistency the hobgoblin of little minds", and you really can accept out-of-date data and deal with merging conflicts later?

I just don't understand why we would expect consistency to rank up there, when we deal with a worldwide real-time system where the difference between getting served by a local database in 40 milliesconds and one far away in 250 milliseconds is both staggering, and incredibly noticeable. why be consistent? what is consistence?

[1] http://www.findlatitudeandlongitude.com/antipode-map/

> Do you literally add 120 milliseconds to each and every request?

Yap, you have to add all the delay until you get confirmation from the members of the cluster that the write was written. And you have to have linearizability, so that anyone reading after that (and one can argue what 'that' and where 'that' is) will now see the new state. If one of the members has failed you could potentially be stuck forever waiting. Now you also have to make sure how cluster membership and connectivity is defines and what are the possible state and transition during membership change, coupled with network partitions, coupled with hardware failures.

In other words you are fighting against the laws of physics. It is expensive and hard to do.

In case of the counters, one should ask is it worth it. Or is an CRDT based counter (that will eventually converge) good enough.

Even banks are eventually consistent. They choose to be available first. So you can withdraw $100 in New Zeland and then $100 in New York with a short period of time even if you only have $100 in your account. Inconsistency is handled later when you get a letter that your account is overdrawn.

I've spent too much of my life on this problem :).

The short version is that you can expose the different read/write modes to the client, and let them decide on what works best for their use case.

For spanner, table 2 in the paper[1] highlights the different interaction modes (read/write transaction, read wait for pending transactions, read skip pending transactions, read at some past timestamp). Each operation gives you a "consistent" view of the data, where "consistent" means that you don't see partial transactions.

It turns out that, for most high-qps web applications, you want a recent consistent view of the data, but you don't care about pending transactions. That means you can read at "now", or at "now - 1 second" without any latency penalty.

Clients doing read/write will care about pending transactions, but you can be opportunistic if the transactions do not overlap. Settling transactions does require quorum synchronization, so you are limited to max(median+1(latency)) milliseconds. These databases are usually keyed by something that minimizes transaction overlap (e.g. you can shard a counter, that kind of thing).

Regional migration of quorums (e.g. counter for asia, counter for europe) can also be done. So, the counter for "asia" consists in a "segment" (not the actual terminology, but we'll go with it) where the quorum exists mostly distinct asia regions. Metadata for the segment (which everything reads at startup, and is kept up to date) will tell you what servers are responsible.

[1] http://static.googleusercontent.com/media/research.google.co...

There are many systems which can work just fine in an eventually-consistent manner. A database of people (customers, users, etc) is a classic example of such a thing.

In general I think consistency is over valued. There are plenty of cases where it is important. Lots of people are brainwashed in college to think that all data must be consistent all the time, and that's just not necessary.

>Lots of people are brainwashed in college to think that all data must be consistent all the time, and that's just not necessary.

I knew it! So a foolish consistency is the hobgoblin of little minds.

While I agree strict consistency is probably overkill in many if not most situations, the problem with not having consistency is that it potentially makes the application logic much more complicated.

Take the database of customers - so if you don't have consistency, what happens in case someone changes the company address and another person simultaneously requests a delivery of something. Do you risk ending up with half of the old address and half of the new one on the parcel?

Note you can certainly have this problem in a consistent system too, e.g. if you make a UI without a save button where the address is changed one field at a time.

Concurrency is just intrinsically hard.

Note that the real world operates like this too: before computers, if someone changes their address and simultaneously sends a package, the package probably will end up at the wrong address. We have a number of mechanisms in place to mitigate this when it occurs (address forwarding, return-to-sender, customer support, credit card chargebacks), but they still don't always work, and sometimes packages just get lost.

The real world solution to this is the acceptance that yes, sometimes bad things happen for no reason at all. I suspect that the computer world will eventually move to this as well, with consumers becoming more tolerant of machines that simply give the wrong answer some of the time, as long as they give the wrong answer less frequently as a human would.

"Lack of consistency" here doesn't have to imply lack of atomicity; In the case of your example, an eventually consistent system could return either the old address or the new address at some point in time X, but never a mixture of both. In your situation, the parcel could end up getting delivered to the old address, but there would be no corruption.
Couldn't agree more. Most people don't know that many critical systems e.g. banking are eventually consistent and rely on other methods.

The concepts of CQRS and Event Sourcing really need to be taught at universities.

Given a set of perfectly synchronized distributed clocks you may not even have to wait the 66ms. Incoming transactions (both local and remote) go into the write-ahead log, their ordering is given by the timestamps (which are consistent, because clocks are synchronized). Periodic heartbeats from other nodes give you a green light to commit or abort parts of the write-ahead log into permanent database. All nodes will make the same decisions wrt each transaction.

You will only have to wait for the commited/aborted response, which cannot be achieved faster than 2*66=132ms (and this system can come arbitrarily close to that, by increasing the heartbeat frequency).

There is no need to wait any time before running a subsequent transaction though. Confirmations will flow with a 132ms delay, but there is no limit on transactions concurrency.

Warning, this is a very dangerous practice. The whole point of the article states that machines do fail and that networks are unreliable. He had a specific section in their that even Spanner has a 7ms uncertainty on time.

Therefore: You CANNOT trust timestamps or your clock.

Yes, I know. This is a thought experiment, as there is really no such thing as perfectly synchronized clocks.

Also despite the need for futuristic clocks the system lacks resistance to failure (no heartbeat from arbitrary node -> no transaction gets commited and no transaction gets aborted -> a deadlock). Maybe this is fixable with Paxos, I'm not sure.

Nice points, have to mention Aerospike as another that did a lot of the right stuff with regard to distributed actions and clustering.
> Recently I see a lot of advocates of dropping NoSQL databases and moving back to Postgress or other SQL databases.

This is just from a vocal minority on HN. You just need to look at the facts.

Companies like Mongo, Datastax, Aerospike etc are growing bigger by the day, with increasingly higher valuations. Old school database companies like Teradata are now all about datalakes incorporating Hadoop and Mongo. And technologies like Spark, Impala are now on the front line for many data analytics and processing work.

In the enterprise at least SQL databases are increasingly being relegated to a small part of the whole data pipeline i.e. storing the consolidated, integrated data model.

The advocates of dropping NoSQL and returning to SQL databases are indeed a minority — as the majority never went away from SQL databases, especially for the purposes they fulfil well.

Right tool for the job and all that stuff.

Again the facts simply don't agree with you. Companies have been moving away from SQL databases in droves compared to the 1990s. And why wouldn't they ? Vendors like Oracle, Microsoft, IBM etc have been screwing them over for ages. Low cost data lakes are the new norm.

Of course I am talking about middle to enterprise companies. I am sure for individuals a typical LAMP setup is still going to be fine. But then again many of those are running their apps in the cloud and hence want a database that is resilient in the face of node outages. MySQL and PostgreSQL are both a PITA to get this right compared to almost every NoSQL database.

> Again the facts simply don't agree with you. Companies have been moving away from SQL databases in droves compared to the 1990s.

Do you have any links to said 'facts'? Otherwise your comments are just hearsay and anecdote.

Indeed. I suppose it depends on the circles in which one runs but there are still companies actively writing COBOL (or were as of 2011 as far as I can personally verify).

And they were making millions of dollars from that one small segment of the company.

I'd wager that in the enterprise (whatever that means) that where NoSQL is used in companies more than, say, 10 years old, it is generally for non-critical, exploratory one off projects. I don't even think NoSQL is close to 50% market share among what I'll call the silent majority of more conservative, more enterprisey tech companies/IT departments. I have no figures to back that intuition up with, however.