Hacker News new | ask | show | jobs
by fotta 595 days ago
> Golang HTTP2 clients will reuse the first server they can connect to over and over and the DNS is never re-resolved.

I’m not a DNS expert but shouldn’t it re-resolve when the TTL expires?

6 comments

You nerd sniped me. The guts of how http2 deals with this in golang is in transport.go : https://github.com/golang/go/blob/master/src/net/http/transp...

If I’m reading the code right round trips (HTTP requests) go through queueForIdleConn which picks up any pre-existing connections to a host. The only time these connections are cleaned up (in HTTP2) is if keepalives are turned off and the connection has been idle for too long OR the connection breaks in some way OR the max number of connections is hit LRU cache evictions take place.

Furthermore, the golang dnsclient doesn’t even expose record TTLs to callers so how could the HTTP2 transport know when an entry is stale? https://github.com/golang/go/blob/master/src/net/dnsclient_u...

It should, but like the sibling, I haven't seen what Go does. I've seen it happen elsewhere. Exchange used to cache any answer it got until it restarted. Java has had that behavior from time to time if you're not careful as well.

Querying DNS can be expensive, so it makes sense to build a cache to avoid querying again when you don't need to, but typical APIs for name resolution such as gethostbyname / getaddrinfo don't return the TTL, so people just assume forever is a good TTL. Especially for a persistant (http) connection, it kind of makes sense to never query DNS again while you already have a working connection that you made with that name, and if it's TLS, it's quite possible that you don't check if the certificate has expired while you're connected or if you do a session resumption.

But innocent things like this add up to make operating services tricky. Many times, if you start refusing connections, clients figure it out, but sometimes the caches still don't get cleared.

> but typical APIs for name resolution such as gethostbyname / getaddrinfo don't return the TTL

Oh wow I didn’t know this but I looked it up and you’re right. Interesting.

I've seen DNS only be refreshed when restarting on embedded devices I work with too. They use a proprietary HTTP library...
I don't know about Golang but I swear I've seen this before as well - clients holding on to an old IP address without ever re-resolving the domain name. It makes me wary of using DNS for load balancing or blue-green deployments. I feel like I can't trust DNS clients.
It's been 8-10 years but when I was serving tracking pixels we were astonished how long we still got requests from residential IPs for whole hostnames we had deprecated. That means I would not trust DNS caching anyway. I'm not talking days here, but months, with a TTL set to mere days.
Some reasons to connect to the same IP: TCP Fast Open, TLS session resumption, connection pools, residual censorship.
The other reason: you have an open TCP socket that you're actively using. Unless you finish with that connection or it breaks, why would you re-resolve it when you're not running connect() a second time? The failure mode we noticed most when looking into why clients weren't following DNS changes isn't that they were long lived connections, like a server copying a large file or streaming logs. Which isn't unusual if you think about it, just not a short lived web browser or curl-esque connection.
TTL isn't universally respected. Consider the following path:

Your machine -> Local router -> Configured upstream DNS Server (ISP/CF/Quad8/etc) -> ? -> Authoritative DNS Server

Any one of those layers can override/mess with/cache in a variety of ways including TTL. This is why Cloudflare and a variety of other providers use IP anycast. They accepted DNS for what it is and worked around it.

Not only is the IP always the IP, the "global" BGP routing table actually universally and consistently updates much faster than DNS. Then whatever routers, machines, etc downstream from that don't matter.

I read through the golang code once due to coming across this issue with kubernetes clients which use the standard golang http client under the hood.

I would need to re-read the code to refresh my memory.

not an expert but overall; unless connection closes for any reason, resolution does not happen.

also, java historically had -1 ttl (eg: infinite) by default. causing a lot of headaches with ephemeral/container services.