Hacker News new | ask | show | jobs
by bobfunk 1911 days ago
Full RCA with the steps the team has taken to improve this setup will be coming soon. The main issue with AWS's DNS solution, in this context, is that they don't support ALIAS records or similar techniques (CNAME flattening, etc) for A records pointing to any external provider. That limits our options a lot in terms of what we can do, since anyone using this setup need to point all their traffic to one or more fixed IP addresses.

Our current solution for the free/self-serve tier of Netlify has been to rely on Google's load balancer product to give people a stable IP pointing to a highly available solution. In light of recent issues, our team has setup a new permanent IP for A records (75.2.60.5) backed by a different solution, but due to the way DNS providers with no ALIAS record support work, it does require our customers to manually change their A records.

I totally get that moving DNS providers is a big deal and we want to give the best experience we can regardless of what provider you're on, but we have to work within the technical limitations of those providers and it's the nature of things that we do have more options to deliver a completely seemless experience when we operate both the DNS and the edge layer for customers.

1 comments

Route 53 General Manager here. Flattening of external provider CNAMEs has a number of availability and accuracy risks. Route 53 offers a 100% availability SLA, and we really mean it. We’ve heard over and over from customers that reliability is our most valuable feature. We can’t provide that same reliability when external queries are in the mix; if we query asynchronously then features such as geo-based routing don’t work as expected for customers. If we query synchronously, then latency and availability are impacted directly.

We do offer ALIAS records between Route 53 hosted zones, and this capability is open to providers such as Netlify. We’d be happy to have customers ALIAS to a hosted zone managed and updated by Netlify. It sounds like your IP addresses are relatively stable, keeping these in sync doesn’t sound like it would be a big deal, and would give you a lever you could pull to change your customer DNS quickly in an event such as this. You could also configure health checks on your own DNS records, which any customer ALIAS records that point to your DNS records in Route 53 would inherit.

If you’re interested in going this route, please contact me at alecpete <at> amazon <dot> com.

If each Route 53 POP is already close to the querying DNS client, then things like geo routing with cached answers might just work well enough in most cases? With each POP having its own cache.

Auto-refreshing the popular records in the background before the TTL expires to help smooth over any temporary issues?

Other big name DNS providers have ALIAS type records. I imagine according to the SLA, AWS Route 53 is still "available", even if it can't resolve a "target address record" (as the ANAME draft calls them) but Route 53 is still able to respond.