Hacker News new | ask | show | jobs
by LinuxBender 1233 days ago
Is it a good idea to setup my own nameserver which basically just "copies" the entries from my current provider and specify it (wherever that may be). By doing this I won't have to maintain 2 different NS, only the one from the provider since the 'secondary' will simply be a copy of the primary?

Do you mean authoritative secondary replicas? If so, that is not uncommon. If your DNS provider is being targeted and your company is not then your DNS servers will still respond and a percentage of clients will try them. While root servers allow 10 records, anything beyond 4 will become less useful as different resolvers cap the number of NS records they will try. You can look up which OS/resolver has what behavior and then make a decision based on what OS most of your customers use. Amazon for example uses batches of 5 anycast records. If your commercial DNS provider is also Anycast, theny one could use 2 of those records and 2 of your own company hosted DNS just fine.

Look into how your commercial DNS provider handles zone transfers then set up a couple decent servers that uptake all the zones you want redundancy for. Just know there is no concept of priority meaning what order they are listed in the root servers does not matter. Whatever servers you set up will need to take a percentage of the traffic your commercial provider is absorbing. If doing this on a VPS provider I would suggest Vultr as they support Anycast meaning you can spin up many VM's to handle the load and still only have a couple public IP addresses without any load-balancer bottlenecks.

Is it a good idea to simply increase the TTL of the important A/MX-Records? Will for example, 1.1.1.1 still resolve my domain correctly, even if my providers nameserver is down for an hour? (assumed I have a TTL of 3 hours for example)

There are pros and cons to high TTL's depending on how your organization handles changes, failovers, etc... There are some discussions on the web about these pros and cons, too many to name here. It is also important to understand how clients actually cache high TTL's. For example, some clients will cap NS TTL to 86400 seconds regardless of how high they are and some clients will cap A TTL to 1 or 3 days. Then there is the factor of recursive server memory and end-users. ISP caches will expire records much faster regardless of TTL due to memory pressure. Each ISP and public DNS server handles this a little differently. So a high TTL can sometimes help assuming your infrastructure does not depend on being able to fail over things fast and that you are not planning on changing MX end-points. This requires some foresight into how one architects their infrastructure to fully recognize the benefits from higher TTL without incurring operational risk.

I am testing 1.1.1.1 right now and it took many requests to finally get my records cached on all their nodes, so if your domain is popular enough they may be useful.

I suppose that was a long-winded way of saying, "It depends". You should meet with your infrastructure team and think through what systems depend on having a low TTL and keep those low. For anything else a higher TTL is probably fine.

[Edit] It sounds like maybe you were just asking about recursive servers so most of this doesn't even apply.

Some places to browse for more detailed answers would be StackExchange [1] ServerFault [2] SuperUser [3] Just be sure to lurk a long time before asking questions. They are particular about how questions are formatted, how on-topic they are for the particular forum and if one has done an exhaustive search for existing answers.

[1] - https://unix.stackexchange.com/

[2] - https://serverfault.com/

[3] - https://superuser.com/

1 comments

Thanks a lot for the detailed answer. I think having a higher TTL + using 1.1.1.1 as nameserver would be a good idea. I'll let the 'infrastructure team' know about this suggestion
If using the public DNS servers to resolve things then certainly have a few of them. 1.1.1.1 and 8.8.4.4 in case CF has problems.

[Edit] Testing 1.1.1.1 it seems I have finally built cache on all their nodes. It took about 25 to 30 requests. Second test took only a few requests for a different record, same domain. Now I am more curious about their back-end.

If you have your own recursive servers in your datacenter then you can entirely control how many things are cached and how that cache behaves. Unbound is a really good option for this as it is fast, has controls around memory, threads, min/max TTL and you could even push your authoritative zones to the edge Unbound nodes if desired so that those records never expire. Some people take this a step further in their datacenter and have Unbound running on every instance to keep response latency low and handle upstream recursive fail-over better than the OS resolver does.

Another benefit to running your own caching servers is that you can purge records that you know are out of date during outages.