Hacker News new | ask | show | jobs
by azundo 3706 days ago
We were told ELBs are explicitly not designed for long-running connections when we ran into this exact same issue so know that you will always be working around this design constraint if you do long-running connections through ELBs.

There's another case that the article doesn't really discuss (though the evidence of it is in the beginning when all connections drop simultaneously) where the ELB nodes themselves scale vertically at a particular threshold. I believe the setup described is still vulnerable to those scaling events.

2 comments

We definitely observed such drops that we attributed to presumably internal ELB scaling activity, but they happen so occasionally that for the moment they haven't been a real issue, as opposed to this one described in the article which happened consistently at every deployment in our test environment.
Yeah, we've decided to live with the internal ELB scaling risks for the moment as well. We had the exact same situation where a deployment without gradual connection draining (even if we kept an instance in service in every AZ) would cause the ELBs to scale and drop all of our connections every time once we were at a certain scale. Definitely caused us a fair amount of confusion as it would happen minutes after the deploy when everything seemed to be calmed down again.
The author said he needed at least 2 instances in a AZ to avoid the bug, and used that as his workaround in the mean time that Amazon works on the bug.
Really interesting. Only two weeks ago we've been told by an AWS Architect that "if you need persistent TCP connections to servers avoid the ELB and connect straight with your [scaled] EC2 instances". This was for a higher load scenario though.