Hacker News new | ask | show | jobs
by lisper 3976 days ago
Certainly systems should be designed to be robust against failures. But encouraging this by deliberately producing failures in production seems like a bad idea to me. It's kind of like saying, "Let's see if the new hull design works by deliberately steering the boat into an iceberg!"
2 comments

A TCP socket teardown followed by a reconnect is hardly the equivalent of ramming a floating chunk of ice. There are a bunch of reasons you will see that teardown in practice, like NAT timeouts in a home router, or carrier-grade 6to4 NAT, or mobile devices rehoming to a new tower, or anywhere else that state is tied to the path.

Sure this is a deliberately produced failure, but only in the sense that this is a "normal" failure. This is a condition that is to be expected on the internet, and this is simply an additional place it occurs.

Bad analogy. It's like saying "let's see if the new hull design works by deliberately running it into things in a test laboratory setting". Because, y'know, if you deploy an application to production using a particular network configuration (that is, using an ELB) without testing it in a development/staging environment first, you're doing a poor job.

This disconnect behavior is just a property of the system. Either you design your application to handle it, or you use a different system. (Not that you can get away with not handling disconnects even without ELBs.)