Hacker News new | ask | show | jobs
by atonse 1296 days ago
We decided to retire the whole infra about 8 months ago. A lot of our consul complications happened about 18 months ago.

The consul clusters would keep failing in QA (they were running on t2.nanos – but that should be plenty of bandwidth for raft not to blow up every couple weeks, same happened with t2.micros too).

Before we pulled the plug we had started seeing something about ec2 health checks failing and the autoscaling groups yanking servers out and replacing them with new servers. but this is exactly the kind of case where consul should've just added the new machine right in. Instead, the 3 node cluster (now 2 nodes) would just sit there saying "hey I can't find a leader... aaaah. I can't find a leader" – well, to paraphrase Mike Myers on SNL, TALK AMONGST YUHSELVES and figure it out, there are two of you remaining.