Hacker News new | ask | show | jobs
by blantonl 5534 days ago
Regardless, I do agree that building your application today like it is a solved problem is the wrong way to do it.

That presumption assumes that the application is being used as the right tool to resolve the problem. And it also assumes that "the problem" is a finite and solvable item.

1 comments

> And it also assumes that "the problem" is a finite and solvable item.

Yes. To make this a bit more concrete, if "the problem" is making distributed storage look and behave exactly like local storage, the CAP Theorem has something to say about its solvability.

Depends. Local storage is also not perfectly available. If the network is reliable, you can probably get availability high enough that the system feels "close enough" to how local storage feels. Today's networks aren't that reliable, but someday there may be enough redundancy and bandwidth for this to happen.
> Local storage is also not perfectly available.

Technically true, although you don't have to contend with the consistency or partitioning factors in the local disk case -- there's only one copy of the state. This means you can focus on making the availability factor as close to 1.0 as possible.

This may not be the case when you're forced to balance all three CAP factors. I sometimes wonder if a follow on result to CAP will be a "practical" (physical or information theoretic) limit like C x A x P <= 1-h for some constant h, and we'll just have to come to terms with that as computer scientists, as physics had to with dx x dp >= h. This is of course wildly unsubstantiated pessimism.

Also, I would gladly entertain any argument demolishing the "local disks are not subject to CAP" claim I made above by talking about read / write caches as separate copies of the local disk state.

I doubt it. Suppose that there is a network that is never partitioned, and machines connected to that network that never fail. In that case consistency and availability should be perfect. Although networks will never be perfectly reliable, nor machines, they seem to be getting more reliable. Perhaps someday we may be able to say that the odds of enough partitions or machine failures to make the system unavailable are lower than the odds of you getting struck by lightning, at which point you will have for practical purposes defeated the constraints of the CAP theorem.
> Suppose that there is a network that is never partitioned, and machines connected to that network that never fail. In that case consistency and availability should be perfect.

You mean, in that case tolerance to partition and availability should be perfect.

> Perhaps someday we may be able to say that the odds of enough partitions or machine failures to make the system unavailable are lower than the odds of you getting struck by lightning, at which point you will have for practical purposes defeated the constraints of the CAP theorem.

So this is the really interesting question. All the CAP theorem says is that (C,A,P) != (1.0,1.0,1.0). How close to (1.0,1.0,1.0) could we make (C,A,P)? If infinitely close, then we have achieved perfection by the limit, and the CAP theorem is rather pointless. If not, then what is the numeric limit?

As you speculate, maybe the numeric limit on C x A x P is so close to 1.0 that the odds of seeing a consistency, availability, or partitioning problem are much smaller than getting hit by lightning.

Then again, maybe not. Who knows? ;)

To avoid sounding like a total crackpot, here is an interesting paper that explores the physical limits of computation:

http://arxiv.org/pdf/quant-ph/9908043v3

> You mean, in that case tolerance to partition and availability should be perfect.

No. If a network is never partitioned, you don't need to write algorithms that can tolerate partitions. Therefore consistency and availability are possible.

> So this is the really interesting question. All the CAP theorem says is that (C,A,P) != (1.0,1.0,1.0). How close to (1.0,1.0,1.0) could we make (C,A,P)? If infinitely close, then we have achieved perfection by the limit, and the CAP theorem is rather pointless. If not, then what is the numeric limit?

I think you have misunderstood the theorem (at least, if my bachelor-degree-level understanding is correct). C, A, and P are not variables you can multiply together or perform mathematical operations on. They are more like booleans. "Is the web service consistent (are requests made against it atomically successful or unsuccessful)?" "Is the web service available (will all requests to it terminate)?" "Is the web service partition-tolerant (will the other properties still hold if some nodes in the system cannot communicate with others)?" These questions cannot be "0.5 yes". They are either all-the-way-yes or all-the-way-no.

> . . . and the CAP theorem is rather pointless

Not really. It is pointful for networks that experience partitions. It just doesn't apply to reliable networks. It also sort-of doesn't apply when an unreliable network is acting reliably, with the caveat that since it is not possible to tell in advance when a network will stop behaving reliably, you still have to choose between these three properties when writing your algorithms for when the network behaves badly.

Out of interest, where do you draw the line between local storage and distributed storage? By local storage do you mean directly a attached storage?

What about FC SANs or iSCSI over a WAN? Are they local or distributed?