| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TBurette 1864 days ago

The CAP theorem applies whenever you have a distributed data store. This means a system you can write data to and read back from. This system is made of multiple nodes connected through a network and the clients can connect and make read/write requests from any node.

If you don't meet those conditions CAP doesn't apply to you. For example :

- an SQL database with master/slave replication but all the clients only ever access the database from the same nod (the clients will always have a consistent view of the one node they access)

- multiple processes on the same machine doesn't apply (it's a concurrent datastore but not a distributed one)

- a single SQL database instance doesn't apply (data is not distributed even if there are multiple clients. Confusingly enough if clients can keep some data and work in offline mode, then it would apply because the clients would be considered as a node of the system)

CAP was clearly created in the context of distributed database : SQL, noSQL, CDN,... Can it apply to an entire system made of multiple components such as database, multiple services,... ? Things get muddier but I'd say yes as long as you check all the boxes: a system with data spread on multiple nodes that clients can access to where the connection can get severed.

Application servers with no local copy losing connection to the server : no. Where is the distribution of data? Where can a network failure partition the data? There is not much A or C choice to make when a partition occurs. Applying CAP wouldn't be very interesting here.

However application servers with local copy : yes CAP applies. In case of a partition there is a trade-off to be made between C and A.

To complicate things further, CAP has basically two versions. There is Brewer's version which talks in general terms is not formal and gets hazy when you drill down to the details. The other version is the one of Lynchs's Paper that provided a proof for the CAP theorem. In that version the definitions are very strict and consequently many real worlds aspects don't fit. Depending on who you ask you may get different answers.

For example, take the context of a database distributed on multiple servers in a datacenter with a reverse proxy that can detect partitions and always redirect read/write queries to non-partitioned nodes. In that context you could say that you have all three C, A and P because there is never going to be a request on a partitioned node. According to Lynch's version of CAP this is not C consistent because you have non-failing nodes that can read stale data (the question of whether there is going to be any request or not is outside the purview of the proof).