Hacker News new | ask | show | jobs
by pmalynin 3741 days ago
Yeah, tried to access our site and it was down. Really was expecting more out of Digital Ocean than to fuck up such an integral part of their infrastructure. In the future we'll be transitioning away from their DNS solution because this is unacceptable.
3 comments

I hope your clients/users are as understanding and civil as you are.

In the meantime, I'm going wait for post-mortem before deciding if I should continue using them for dns. Looking back over the status history, 1-2 incidents a year isn't that bad for my needs, but might be too much for you, which is fine (since I'm only hosting a couple of small side projects with them).

The problem is, for an early stage startup incidents like this are deadly. Especially since we just applied to a bunch of accelerators.
I get that this is a pain in the ass. I've got a significant chunk of infrastructure on DO, I've got work to do today that depends on those machines. I learned about this simultaneously when a deployment failed and I got a text from an engineer at a company I consult with. Not a great way to start the day, for sure.

Know what I'm going to do? I'm going to have a cup of coffee and play with my dogs for a bit. It's inconvenient, it's going to delay things, and I'm a bit choked about it. But it's not worth getting angry over, because there's nothing I can do about it today.

The resolution is, for any app/startup/business everything is a risk and if you didn't include the edge-case of "What happens if my primary DNS nameserver goes down for my domain?" into account. Is all you can do is blame DO?

If your app goes down do you have failover for that? Or do you blame your devops team?

Any small business has to manage which risks it accepts. OP got the Digital Ocean service to try and mitigate this risk to a degree. Beyond that, it becomes a question of accept risk in other aspects of product or a risk that your primary DNS server provider will fail?

The reality is, you have limited development and financial resources so you simple can’t do everything. Sure, in a vacuum, or in a larger enterprise, we’d love to manage every risk. But when we're just starting out that’s not realistic, and we do have a right to be upset at Digital Ocean's service going down while at the same time realizing that yes, ideally we would / should have had redundancy in place already.

Your question on blaming the devops team is exactly the mindset of someone who has a lot more resources than a brand new startup. In a brand new startup there IS NO devops team. If you're lucky, there is one person who does the devops work part time, balanced with a bunch of other development work he/she also does.

We have auto failover for server, app, and database failures -- this can be easily managed. DNS Nameserver failover should have been built-in, after all there is a reason we specify 3 DNS nameservers into the domain configuration, since Digital Ocean took on this task (whereby we used Gandi's DNS before that) we expected it to perform as advertised -- so yes blame lies with DO.
It's still entirely your fault. Something like Route53/Cloudflare is dirt cheap and crazy redundant. Don't risk your business on free/side services.
You can go to your domain registrar and switch to another DNS provider (GoDaddy has their own DNS service).
Be happy that this happend early. Now you know that you should never ever have a single point of failure.
Just add a second dns provider.
If your site is so critical that it can't suffer any downtime then why is it not provisioned across multiple independent platforms?
I also hope the users of your site understand that shit happens. Also as another user said... if DNS is so critical for you then why don't you have proper failover in place?