| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tsimionescu 830 days ago
	No, by definition, a partition event is one in which some amount of the nodes are entirely non accessible by any other node in the cluster. Even if there were two different US nodes, and only one node lost connectivity to the others, the problem wouldn't change at all - the branch which only has access to the disconnected node would still either send shipments to the wrong address or stop being able to read the status at all.

1 comments

andras_gerlits 829 days ago

We need to clarify first if we're talking about CAP or real life. CAP requirements are absurd. To quote Dominick Tornow from here:

https://blog.dtornow.com/the-cap-theorem.-the-bad-the-bad-th...

"Note that Gilbert and Lynch’s definition requires any non-failed nodenot just some non-failed nodeto be able to generate valid responses, even if that node is isolated from other nodes. That is an unnecessary and unrealistic restriction. In essence, that restriction renders consensus protocols not highly available, because only some nodes, the nodes in the majority, are able to respond."

Partition with a capital P doesn't really move me as our experiences contradict its assertions, as the model it bases its statements on is fundamentally broken.

So let's talk about partition in the real world. Practically speaking, partitions mean that some nodes experience high latency when communicating. If you need to update some record in a specific data-centre and you can only to talk to that place via a single channel and that channel is disrupted; yes, you're going to experience a slowdown or even halt. Nowadays however, even widely used consensus-algorithms can mitigate that, and the delinquent node will eventually be dropped from the group if it causes enough problems. We don't do anything very different in this regard. Since our deterministic ledger can be replicated across multiple data-centres (as no nodes in it create original information), and since observers will only rely on records the ledger already sent out and since the only reason you would need to talk to this ledger is if you need to modify a shared record with other observers, you can always pick a different ledger-instance, there are no "master copies" anywhere. Remember that we can stream time-information with the data, so the client can always calculate its "point in time" and reconstruct a (even globally) consistent view of the data it has.

Sure, if you choose to centralise all your data behind a single flaky connection, you're gonna have a bad time. The point is that the setup we built allows you to not need to do that and it does that transparently, behind SQL semantics.

link

tsimionescu 829 days ago

No, the CAP requirements are not at all as absurd as that article claims.

For the specific quote you gave, that is an obvious assumption. A client only has access to some of the nodes in the distributed system. Of course we want any node to give the correct answer - the whole purpose is to reduce the burden on the client. The client is not responsible for searching all of the nodes. And note that the proof doesn't actually require that all nodes return the right answer - the contradiction is reached as long as all the nodes that the client has access to return the wrong answer.

Another bad claim in the article is that the proof of CAP requires that the partition is permanent. Maybe it's written like that for simplicity, but it obviously only requires the partition to be longer than the client's bound on response time. If the client is willing to wait an hour for a response, then any partition event that's two hours long will lead to the same conclusion. Since clients never have unbounded time to wait, and since partition duration is unbounded even if not permanent, then the argument still holds.

Also, major network outages that disconnect whole regions of the internet for hours from the rest of the world happen somewhat regularly (more than once a year). Whole AWS regions have become disconnected, ~half of Japan was disconnected for a few hours, Ukraine has been disconnected several times, etc. If you run a large distributed system for a significant amount of time, you will hit such events with probability approaching 1.

link

andras_gerlits 829 days ago

I can only repeat what I told you earlier. Our distributed consistency model meets the SQL-standard's requirements for consistency and tolerates such outages. This is a fact.

CAP is a bad model for more reasons than the ones listed in that article. My favourite one is that it requires Linearizability, which nobody does in SQL. The disconnect when saying "SQL is not consistent" to me is just too much. CAP is based on a badly defined idea that comes from a presentation that was wrong in what it said.

That you need to tolerate outages of entire regions is a good argument to make in itself, there's no need to point at CAP. My answer to that is that as there's a way to define consistency in a way that allows for it to manage partition problems more gracefully, and that is the model we show. If you require communication to establish consistency and stream the changes associated with the specific timeslot at the same time, partition means that the global state will move on without the changes from the partitioned areas and that they will show up once the region rejoins the global network. While separated, they can still do (SQL-) consistent reads and writes for (intra-region) records they can modify.

link

tsimionescu 829 days ago

Are you saying that it's possible for an SQL server to allow to successfully commit a transaction where you modify a record, and then in a subsequent query, return the old value of that record? I very much doubt this is true of any single-node SQL database.

In contrast, any distributed system has this property (or it may refuse the query entirely) in the face of partitions.

link

mrkeen 829 days ago

Thanks for the link.

> https://blog.dtornow.com/the-cap-theorem.-the-bad-the-bad-th...

Especially:

> The “Pick 2 out of 3” interpretation implies that network partitions are optional, something you can opt-in or opt-out of. However, network partitions are inevitable in a realistic system model. Therefore, the ability to tolerate partitions is a non-negotiable requirement, rather than an optional attribute.

> CAP requirements are absurd

Yes! Literally. One would roundly ridicule someone who claimed to have met (or exceeded) those requirements.

link

andras_gerlits 829 days ago

CAP means many different things. If you took the time to read what I have to say about it, you would know that I'm saying that we're beating the requirements Brewer sets out in his original presentation, where he introduces the concept of the C-A-P tradeoff. He's clearly wrong in what he says in the presentation, which is what we say we're beating. We can say this, because we're meeting the requirements for "C" there (DBMS-consistency) and because we don't suffer the trade-offs mentioned there. In fact, our system can be both available and partition-tolerant with a definition of "C" that matches the ones laid out in the SQL-spec, as the reads are always local. The SQL-standard doesn't mandate time-related availability guarantees for writes.

link