Hacker News new | ask | show | jobs
by pmalynin 3742 days ago
The problem is, for an early stage startup incidents like this are deadly. Especially since we just applied to a bunch of accelerators.
5 comments

I get that this is a pain in the ass. I've got a significant chunk of infrastructure on DO, I've got work to do today that depends on those machines. I learned about this simultaneously when a deployment failed and I got a text from an engineer at a company I consult with. Not a great way to start the day, for sure.

Know what I'm going to do? I'm going to have a cup of coffee and play with my dogs for a bit. It's inconvenient, it's going to delay things, and I'm a bit choked about it. But it's not worth getting angry over, because there's nothing I can do about it today.

The resolution is, for any app/startup/business everything is a risk and if you didn't include the edge-case of "What happens if my primary DNS nameserver goes down for my domain?" into account. Is all you can do is blame DO?

If your app goes down do you have failover for that? Or do you blame your devops team?

Any small business has to manage which risks it accepts. OP got the Digital Ocean service to try and mitigate this risk to a degree. Beyond that, it becomes a question of accept risk in other aspects of product or a risk that your primary DNS server provider will fail?

The reality is, you have limited development and financial resources so you simple can’t do everything. Sure, in a vacuum, or in a larger enterprise, we’d love to manage every risk. But when we're just starting out that’s not realistic, and we do have a right to be upset at Digital Ocean's service going down while at the same time realizing that yes, ideally we would / should have had redundancy in place already.

Your question on blaming the devops team is exactly the mindset of someone who has a lot more resources than a brand new startup. In a brand new startup there IS NO devops team. If you're lucky, there is one person who does the devops work part time, balanced with a bunch of other development work he/she also does.

We have auto failover for server, app, and database failures -- this can be easily managed. DNS Nameserver failover should have been built-in, after all there is a reason we specify 3 DNS nameservers into the domain configuration, since Digital Ocean took on this task (whereby we used Gandi's DNS before that) we expected it to perform as advertised -- so yes blame lies with DO.
It's still entirely your fault. Something like Route53/Cloudflare is dirt cheap and crazy redundant. Don't risk your business on free/side services.
You can go to your domain registrar and switch to another DNS provider (GoDaddy has their own DNS service).
Be happy that this happend early. Now you know that you should never ever have a single point of failure.