Hacker News new | ask | show | jobs
by tracker1 4057 days ago
It really depends on the amount of writes your database needs to handle, and the overall quantity of data. There are very strong CAP systems (Zookeeper comes to mind), but I wouldn't want that for very large quantities of data, as it wouldn't be able to keep up.

For example, if you are doing logging for a few million simultaneous users (say 5-8 per resource request across service layers), a single system wouldn't be able to keep up, and definitely a single system that has to coordinate each write as an atomic/consistent state.

The fact is, depending on your needs, you will need to sacrifice something to reach a scale needed by some systems and to minimize down time. It's a trade-off. And any distributed database will have real-world weaknesses.

1 comments

> There are very strong CAP systems

There are not strong CAP systems -- that is the whole point of the CAP theorem. It's impossible. Am I misunderstanding you?

No, zookeeper doesn't do magic. It does, however, create a large enough value for P to be lost that it can be thought of as CAP; The number of failures required is high enough that, in normal use and sane configurations, you won't partition.
P (partition tolerance) refers to network partitions. There's nothing a distributed system can do to prevent network partitions; they're a property of the network. The question is, in the face of partitions, what does the software do? It basically has two options:

* Respond to requests, even though it may not have the most up to date information. I.e., it sacrifices consistency. These are AP systems.

* Not respond to requests, in which case it sacrifices availability. These are CP systems, of which ZK is one.

In particular with ZK, if you lose quorum (i.e., the cluster has fewer than (n + 1) / 2 active nodes where n is the cluster size) the cluster (or partition thereof) will become unavailable in order to avoid sacrificing consistency.

So it should be called the CA theorem, right? You can't choose partition-intolerance.
Non-distributed databases (like postgres) are sometimes considered to be CA. But I think it's clear that CA doesn't make sense in a distributed system.