| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by y0y 2609 days ago
	DNS is also inherently distributed. This should make it resilient to all of the most common outage scenarios, and is likely why AWS offers a 100% uptime SLA for Route 53. I'll be interested in the post-mortem from Azure on this one.

2 comments

deathanatos 2609 days ago

> likely why AWS offers a 100% uptime SLA for Route 53

Well, that's interesting. We occasionally see getaddrinfo() calls fail claiming domains that we know exist at the failure time (b/c the records are completely static) don't exist. (We've not got a reproducible case for this yet, and it's incredibly rare for any given VM/service. But across our fleet, it crops up fairly regularly.)

link

donavanm 2609 days ago

I used to work on route 53 for a few years. I cant speak to your specific issue. Too much depends on your clients, your networks, your resolvers. But ... turn on query logging at a minimum. You should get a timestamp, qname, and rtype to identify nxdomain.

That said the most common cause of authoritative nxdomain is if youre adding/deleting records and querying them before propagation is complete. You may want to log/poll your rrset change status separately to correlate.

The other is that depending on networks intermediate dns tampering happens all the time. Qname, rname, rtype, all get modified. Responses and queries are duplicated, intercepted, and manipulated. Some good research out of dns oarc and a dude out of australia (iirc).

link

cthalupa 2609 days ago

> We occasionally see getaddrinfo() calls fail claiming domains that we know exist at the failure time (b/c the records are completely static) don't exist.

That could be whatever resolvers you're hitting failing rather than an issue with Route 53 authoritative nameservers, though. The resolving DNS servers in EC2 are not actually part of Route 53, for example.

link

deathanatos 2609 days ago

I'd think that would correspond to EAI_AGAIN or EAI_FAIL, whereas I'm pretty sure we're getting a EAI_NONAME.

link

leesalminen 2609 days ago

We’ve experienced the same thing. I’ve never been able to figure it out. If you ever do, please let me know! I’ll owe you a beer ;)

link

hfern 2609 days ago

You may be hitting ec2 dns rate limits.

link

deathanatos 2609 days ago

I would expect EAI_FAIL or EAI_AGAIN, but I'm pretty sure we're getting EAI_NONAME.

But, the stuff that hits this problem the most often is of the quality level that I wouldn't find that terribly surprising. Seems AWS "documents" this as,

> The number of DNS queries per second supported by the Amazon-provided DNS server varies by the type of query, the size of response, and the protocol in use.

How specific.

link

el_duderino 2609 days ago

Do they typically provide a postmortem?

link

crankylinuxuser 2609 days ago

It's Microsoft. I'm sure they just rebooted it!

(I had to, see username!)

edit: seriously,-3 ? it was a joke.

link