Hacker News new | ask | show | jobs
by jrullmann 4323 days ago
Great article. A lot of engineers don't have personal experience with these kinds of network failures, so sharing stories of their consequences means more engineers can make informed (and conscious) decisions of how much risk can be tolerated for their applications.

One thing that you could gleam for this article-and I think that this is incorrect-is that the application or operations engineer is responsible for understanding the nuances of distributed systems. In my experience the number of people who are relying on distributed systems is much larger than the number of people who understand these issues.

So what we really need are systems we can build on whose developers understand how to build (and test!) the nuances of data convergence, consensus algorithms, split-blain avoidance, etc. We need systems to gracefully-and automatically-deal with and recover from network failures.

Full disclosure: I'm an engineer at FoundationDB