Hacker News new | ask | show | jobs
by chronid 3741 days ago
DNS is hard. Very hard.

It may seems trivial when it works (hint: it's not), but some of the biggest fuck ups I've seen in my professional life were caused by strange DNS things happening or DNS servers going kaboom.

I feel the pain of the DO engineers trying to mitigate this issue. I really do.

3 comments

BS. DNS is a trivial thing to scale, compared to most other web-scale efforts.

Things break when people don't use 20 year old best practices. There is no defense against inexperience and ignorance.

I took the OPs comment as "it's hard to understand DNS and biggest fuck ups happen because people think they understand DNS when they actually don't".

The problem with DNS is that it can work even when it is configured incorrectly. This makes people who has no idea what they are doing that they actually understand it. The strange issues with DNS only happen with strange configurations. When you follow best practices everything is predictable.

All right. This I can agree with.
Please help the ignorant and provide a link to a description of those best practices.
> I feel the pain of the DO engineers trying to mitigate this issue. I really do.

Me too. Just last week they had another problem with DNS on the client side of things: Resolving with the Google Public DNS, which most droplets use by default, didn't work reliably. I hope that they post a combined post mortem for both of those incidents.

It's not hard, the problem is everything relies on DNS so when DNS goes down or has problems you have cascading failure.
That's why you use multiple providers.
I'll bite. You can have multiple NS records but only a single SOA. The .com registry minimum TTL for SOA is a day.

How in the world would "multiple providers" help you in a 6 hour outage?

You can have multiple NS records. You should have ns records that point to different companies DNS servers, and preferably different continents.
That's great for NS records. What about SOA?
Soa isn't used for resolving names afaik.
Unless you use a better resolver than the standard glibc resolver on Linux (e.g. dnsmasq, bind or similar running locally and pointing resolv.conf at it), you appear doomed to slow lookups etc. if your first resolv.conf entry fails, as most of the resolv.conf options that might have helped (if you'd set them) simply don't work or doesn't do anything particularly useful in the versions used in the Linux distro's I've tested it on.
unbound is great for this.
Even with multi providers, DNS issues are a cluster fuck.
Only if you don't know what you're doing. The problem with DNS is that it might work even when it is misconfigured, and misconfiguration is the source of strange issues.
I think that we all have areas where we don't know what we're doing. This is one of mine. With all the talk of how obvious/important/easy it is to have a failover in place in case this happens, I'm having trouble finding a good resource about setting up a redundant DNS. Running a droplet on Digital Ocean with Debian and Nginx.
Sounds like we're begging for someone to write a nice blog post for how they set up redundant DNS across multiple providers "the right way"... Sounds like it would hit the front page in short order if anyone is willing to share how they think this should be mitigated, and specifically how to expect common clients to behavior in that case when faced with the various types of outages that may occur!
Yeah I mean it takes work to do it correctly. I wouldn't call it a cluster fuck.
Suppose you have multiple providers, but one of them screws up and authoritatively denies the existence of all of your hosts?
That's what you keep an extremely low ttl for.
Which doesn't mean much when a nontrivial amount of ISPs out there don't respect the TTL settings.

Source: Days-long service degradation caused by customer ISP's caching bad DNS information well beyond the 10 minute TTL we had set.