Hacker News new | ask | show | jobs
Its always DNS: Why the default BIND setup is failing to resolve (smallhacks.wordpress.com)
2 points by samm_cz 51 days ago
3 comments

One thing I would add to this is when defining where query logs are written to even if temporarily for debugging, ensure it is a tmpfs mount. Bind and a handful of other daemons will block when log buffers are backed up and this can slow down the DNS server. Tmpfs is not a perfect fix but an improvement. In this example /var/log/named/ should be tmpfs. In /etc/fstab it would look something like:

    tmpfs   /var/log/named/        tmpfs   nosuid,nodev,size=2g,noatime 0 0
though I would pick a unique place for tmpfs logs that need not be preserved on reboot such as /var/log/named/tmpfs. It can be useful to also specify a uid and gid that the tmpfs mount belongs to so that statistical tools can read them without world permissions. DNS load testing tools can show the difference in using tmpfs vs. disk even if that disk is NVME. We need not wear down our storage.

Include this debug location in log rotation and adjust the tmpfs size and log rotate frequency according to the DNS server usage under its highest theoretical load.

Another thing I do is have scripts that can quickly display counts of NOERROR and NXDOMAIN by domain to quickly see where I am having a problem though I am sure HN can come up with better ideas.

How complexity of the DNS, DNSSEC and IPv6 + recently enabled DDoS protection made recursive DNS broken in the default setup
Your should rather say - it's always bind (bugs). I wrote about bind eating their query counter on IPv6 even if you don't have IPv6 routing:

https://szafka.net/blog/bind9-as-resolver.html

Run bind with -4 arg or switch to unbound. Bind quality is the same it always has been. Nothing changed after those 25+ years.

Thank you for the article. I tried to report this to the upstream, but it was closed as "not a bug" https://gitlab.isc.org/isc-projects/bind9/-/issues/5898#
Unfortunately bind is as buggy as it always been. I've tried to black hole entire ::/0 but it still eat its query counter without even sending out a single packet.

You need dual stack network and routing to both, or run it with -4 argument for IPv4 only network

And they closing any bug reports is typical "works for me". It's been like this for a long time.