Hacker News new | ask | show | jobs
by tel 4399 days ago
Distributed systems are really hard. This line has been endlessly parroted and, also, endlessly reproven. They are really hard.

The major reason why they're hard is that your standard reasoning toolbox is insufficient. Understanding distributed systems means being aware of a much larger state space and how things like special relativity impact it (I'm being a bit flippant, but not actually inaccurate—message latency is not poorly modeled by special relativity). Most programmers use reasoning methods like operational modeling (single threaded), guess-and-check, and unit testing. These are simply completely insufficient.

If you want to model full distributed systems you might want to use something like algebraic topology. It's a useful tool for understanding the entire state space of a system and its valid transitions. It can be used to prove Paxos or Raft are correct. https://www.ideals.illinois.edu/bitstream/handle/2142/33762/...

Another method is to simplify a whole lot what a "distributed system" is and hope to find a more analyzable model where more standard reasoning tools work. CSP is one of these. It's a simpler model where some of the reasoning tools of regular formal languages can be used to understand its operation.

The upshot is that CSP won't be as powerful as you could imagine possible when building a distributed system, but it'll be massively more manageable so long as its sufficient for your goals. It's, as such, formed the basis for some popular distributed tools like Erlang.

2 comments

> Understanding distributed systems means being aware of a much larger state space and how things like special relativity impact it (I'm being a bit flippant, but not actually inaccurate—message latency is not poorly modeled by special relativity).

This sound really interesting. Any recommendations to learn more about it?

The classic paper on distributed clocks mentions it, Lamport's "Time, Clocks, and the Ordering of Events in a Distributed System" (http://research.microsoft.com/en-us/um/people/lamport/pubs/t...)
You reminded me of this - http://www.informit.com/articles/article.aspx?p=1768317

There are some interesting differences between CSP and Erlang (and Go, which arguably implements CSP a bit more closely than Erlang).

Yeah, to be clear: pure implementations of CSP are probably useless but "near" implementations are abound. I don't know Go, so Erlang came to mind... But it's definitely a little way from pure CSP.
I suspect most "pure" models are nigh useless on their own. They all require a bit of compromise to interact with the world, or exist as a subset of some other language/tool. However, they can be awesome for describing in specifications or analyses of systems. And when languages support them in some fashion (even if impure) it allows us to reduce the semantic distance between specification/requiremnets and implementation to a point where we can be confident that we've done things correctly.
Haskell is an interesting playground for this since it tries so hard to make "pure" practical. Interestingly, they achieve this to a large degree using monads. I think that idea is fairly generalizable, though, that pure formalisms can embed impure ones as models and then have those be executed by an RTS.