Hacker News new | ask | show | jobs
by davidjgraph 1794 days ago
Serious question, has anyone properly solved the issue of DNS as a single point of failure?
9 comments

Depending on what point you draw the line of "single point of failure" you could use multiple providers for your dns.

GOV.UK for example uses both aws and gcp for DNS

So, NS entries pointing to both? But then take the example your domain was in Route53 and AWS goes down. You can't configure the NS entries to avoid AWS DNS servers. Is the idea that child DNS servers detect the outage and cache the values in the name server(s) that remain up?

But then, the cached values from AWS take a while to clear, TTL never seems to be applied properly. It always feels like the worst case in such a scenario is you can point everyone at the right thing within 24 hours.

Configuring two NS entries is pretty standard, so surely most resolvers try one of the two, and if it's down try the other one? What else would be the point of having multiple nameservers? Then you just have to get two nameserver providers and make sure their settings stay synced, and point your domain to one nameserver from each.

Of course that requires the server to properly fail, i.e. stop responding to requests. That doesn't seem to be the case here

You set both services in your ns records. So every day they share the load for dns resolution. If one day one of them is down the client can/will use a different nameserver from your configuration.
Have them all hot and live rather than any sort of failover system. Keep everything in sync with OctoDNS or similar

https://github.com/octodns/octodns

DNS is fastest first* rather than main/failover. If AWS DNS was down your GCP DNS would have replied (if all is well) sooner than {timeout} so your visitor would still have a response

* Sort of. I think if the client doesn't get a reply from the server it picked randomly in 1s they move on to the next server, repeat until all fail

Ibthink if route53 was down. Your dns provider whouldn't able to go there. So it will go to the root who will give gcp one too. So your dns provider might try that.

(I don't know if this is how it works, but I thibk that's how it supposed to work)

You typically have four name servers for a domain, but they don’t all have to be hosted with the same company. Very handy when your DNS provider decides to brag they are unhackable and the hackers reply by immediately hacking them followed by DDoSing them to death.
gov.uk's traffic seems to be handled by Fastly, a well known CDN.

What I'm a bit surprised / unsure of is what happens when I run "dig ns gov.uk". The results are:

  gov.uk.     21559 IN  NS  ns1.surfnet.nl.
  gov.uk.     21559 IN  NS  auth50.ns.de.uu.net.
  gov.uk.     21559 IN  NS  ns3.ja.net.
  gov.uk.     21559 IN  NS  ns2.ja.net.
  gov.uk.     21559 IN  NS  ns0.ja.net.
  gov.uk.     21559 IN  NS  auth00.ns.de.uu.net.
  gov.uk.     21559 IN  NS  ns4.ja.net.
Who is ja.net , uu.net and surfnet.nl ..?

EDIT: I see that ja.net i.e. jisc.ac.uk "manages the second level domain .gov.uk" -- https://www.jisc.ac.uk/domain-registry . I imagine that uu.net and surfnet.nl are there for redundancy

Ah sorry, you're indeed right. Turns out it was just the .service.gov.uk domain that uses GCP and AWS - I just thought that applied to the parent domain too.

  $ dig NS service.gov.uk +short

  ns-cloud-e4.googledomains.com.
  ns-cloud-e3.googledomains.com.
  ns-cloud-e2.googledomains.com.
  ns-cloud-e1.googledomains.com.
  ns-831.awsdns-39.net.
  ns-1983.awsdns-55.co.uk.
  ns-117.awsdns-14.com.
  ns-1080.awsdns-07.org.

  whois ja.net
    Domain Name: JA.NET
    Registry Domain ID: 499794_DOMAIN_NET-VRSN
    Registrar WHOIS Server: whois.demys.com
    Registrar URL: http://www.demys.com
"Demys is a leading provider of corporate domain name management and an ICANN accredited registrar"

  whois uu.net
    Domain Name: UU.NET
    Registry Domain ID: 5486163_DOMAIN_NET-VRSN
    Registrar WHOIS Server: whois.markmonitor.com

surfnet is just an ISP in Netherlands

https://www.surf.nl/

Thanks

Is it possible to see if/where is gov.uk using GCP or AWS for its domain zones? From what I can see -- that's not the case? Or am I looking in the wrong place?

I think you did the right query, maybe they're using it for different domain names?
Last time I tried setting NS to both cloudflare and digital ocean in my domain registry, cloudflare sent me an email saying the configuration is invalid and asked me to revert. Am I doing something wrong?
No, you have done everything right. At least from the point of view of DNS. That you can not use multiple nameservers is a limitation of Cloudflare (limit in the sense of: Cloudflare can only offer their services in the Free and Pro plan if they have full control over all nameservers).
Thank you. I will look into alternative services on the thread then.
And then there are Cloudflare and other Centralized Downtime Networks as another point of failure.
Loled at this.
It is relatively easy to make DNS highly redundant: just put multiple DNS server in data-centers which are as independent as possible (different geo locations, different ISPs). You can also use different DNS software and different OS (say BSD+Linix) to exclude correlated bugs. Root DNS server AFAIK use different software for this reason.

Problems starts when you want to easy make frequent changes and introduce complex software to manage DNS zones (and complexity usually comes with bugs).

The problem isn't DNS though, is it? The problem is that people don't necessarily use the redundancies on DNS?

The whole reason it takes a domain 24h to fully work with DNS is because it propagates the information other DNS servers, thus making not be a centralized service.

DNS doesn't 'propagate' except in the very limited case of zone-transfer publication, which... nobody really relies on these days. Registrars tell you it takes 24 hours to propagate to stop you from complaining to them about your ISP's DNS caching policy. The reality is: recursing DNS servers have caches, they respect TTLs, and for the most part that means that DNS changes should fully wash through within an hour for most changes (less if you keep your TTLs shorter).
That differs per TLD though. In .nl updates are usually fully processed within the hour (they update the zone file twice per hour)
More accurately there are distributed caches, which expire on a simple timer basis, as opposed to updates being pushed immediately.

Relatively short TTLs are ubiquitous these days though.

It's an interesting question, as it's always been solved on the server side. All of the current problem is client side. That is, client resolvers that aren't using diverse providers, and only do things like round-robin with long timeouts.
Anycast for the DNS IPs deals with most of the problems of clients not failing over elegantly when their primary DNS server is broken.
From a client (DNS recursor) point of view there is no primary server. There is just multiple NS records which are equal. If one of them is down it can introduce resolving delays, but they are usually small. At least if something like Unbound or Bind is used. Unbound e. g. maintains infra-cache where it tracks RTT and errors for each server and avoid servers which are down.
https://handshake.org is the only project I've seen that actually solves the issue with a decentralized root zone file.

https://namebase.io is a "registrar" for it.

Why does this need to have the whole NFT / crypto / auction angle?

https://learn.namebase.io/starting-from-zero/how-to-get-a-na...

This is so convoluted it actually makes the whole thing a non-starter

Decentralized control of a centralized finite resource (domain names) requires consensus. For example, Joe Smith and Joe Blow both want joe.com.

You want a protocol that gives consistent "global" state without any centralized / trusted users - blockchain/bitcoin is one of the only technical solutions to provide that.

I agree that it's a garbage solution in practice, but that's why it's got cryptoshit bundled in.

A potential different solution to DNS monopoly, if that is a problem that needs solving, is multiple name-resolution providers that have differing records on what name points where. (The tradeoff is that an owner may need to register their name with multiple different providers).

Agreed. Blockchain is a convoluted solution, but it’s a solution for distributed consensus, if one feels that’s required. But in general I would argue the current root system has served us well and is open and free.

The world you describe, effectively with multiple roots, is coming. Russia have a switch (they’ve even tested it), to anycast out the root DNS IPs within the country, and block them externally. In theory this doesn’t make another “internet” (if IP space is still globally routable,) but in practice it does. Don’t be surprised if other countries follow suit (should they fail to leverage control of current infra via ITU or something.)

You can still hardcode IP addresses. Not sure most people realize DNS isn't actually needed, you know, except for convenience and all that.
The "Host:" header in http[s] pretty much killed that. Half the internet would be a Cloudflare error page if we moved back to ip addresses :)
Add the name/IP to your local hosts file. It all works great then. Until the server changes IPs, anyways.

I did this with a website I liked which had let the domain expire. It worked for quite some time, until the VPS/whatever expired too. Good thing the Internet Archive is a thing.

Meh. Without DNS, or something similar, there really is no internet.

Obviously you are technically correct.

The internet gets along quite fine without DNS. Packets route from network to network. DNS is an application-layer protocol. People often confuse the web with the internet. We use phone numbers for phone calls. It's conceivable with IPv6 you could nail up your IP address and use a QR code to make the addresses accessible. In a hundred years will DNS still be necessary? I don't think so.
It’s one of the most successful, global, distributed databases of all time.

What’s the single point of failure?