|
|
|
|
|
by singhsanjay12
107 days ago
|
|
This matches what I've seen too. Resolver-level resilience is often manageable centrally. The harder part is application-level recovery; especially in larger orgs where DNS behavior spans multiple teams. Even with low TTLs or cache flush automation, apps may: resolve once at startup or hold long-lived gRPC/TCP connections, or even worse -> ignore TTL semantics entirely So infra assumes "DNS healed," but the app never re-resolves. |
|
I sense you may have java in your environment and are probably used to
or something along that line including other options. At least I tried to get teams to use those and then rely on Unbound DNS cache and retry schemes. SystemD also has it's own resolver cache which can be disabled and told to use a local instance of Unbound. Windows servers require Group Policy and registry modifications to change their behavior.One of my pet peeves is when groups do not manage domain/search correctly and they do not use FQDN in the application configuration resulting in 3x or 4x or more the number of DNS requests which also amplifies all DNS problems/outages. That really grinds my gears.
And of course if the Linux system uses glibc, editing /etc/gai.conf to prefer IPv4 or IPv6 depending on what is primarily used inside the data-center makes a big difference.