| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brentonator 2285 days ago

It was a bit of both. I don't think they had the infrastructure to support everyone, I think they had major growing pains as they tried to scale, and I don't think they had the experience to identify sources of partial-failure proactively which resulted in 40-360 minute effective outages that weren't as easy to fix as with AWS where we can shift the stack to a new AZ in minutes.

DO had no cross-"AZ" networking. We tried to form a tunnel but it was so unreliable they admitted they needed their own backbone to support that use-case but even when that did come out we'd still have to use public IPs...no VPC peer type thing. Our only option to switch "AZ"s was to have downtime and that wasn't acceptable.

We were able to increase capacity 300-500% on AWS for the same cost so there's a lot going on in the equation for us. We needed better capability to dynamically resize and more options for low-cpu high-memory. We needed better flexibility with disks especially by network like EBS but given their network issues I understand why they don't have that.

1 comments

swyx 2284 days ago

this is all really good feedback/food for thought for me as I learn about what matters to people. thanks!

link