Hacker News new | ask | show | jobs
by schemescape 1094 days ago
Can anyone help me understand musl libc and DNS issues? Note that I’m most interested in this “DNS over TCP” issue, since the other case I’ve heard of is for custom DNS setup—not for resolving host names in a default configuration.

My reading indicates that DNS resolution simply might not work in certain cases. This seems like a huge problem, yet Alpine Linux is widely deployed and I think Zig uses musl libc as well. In fact , every fully static binary I’ve seen (except for Go?) relies on musl.

For what it’s worth, I’ve seen DNS errors on Alpine (specifically I was getting EAGAIN), but I assumed this was unrelated to musl (and am still unsure). In general, I feel like I see a lot more transient networking errors on Alpine, and I wonder if this is related.

Edit: I also didn’t think DNS would be in libc, so I’ve got a lot to learn…

2 comments

If you use Musl 1.2.4+ (or Alpine 3.18+), there are no longer the same DNS fallback issues: https://www.openwall.com/lists/musl/2023/05/02/1

To summarize the issue: DNS is done optimistically over UDP because it's faster, but this doesn't work when DNS responses are large because of the design of UDP. TCP should be used as a fallback mechanism when responses are large. This is uncommon normally, but increasingly DNS responses are large in special scenarios; for instance when you're querying an internal DNS for service discovery (read: k8s or nomad deployments, most commonly).

Musl's maintainer interpreted the spec for a libc's resolver to not require TCP fallback (source: https://twitter.com/RichFelker/status/994629795551031296?lan...), so for a long time Musl simply didn't support this feature, justifying it as better UX because of the more predictable performance.

I don't agree with the maintainer on this interpretation, but I am glad the feature was added and the issue is no longer a concern as an otherwise very happy Alpine user!

I’d found bits and pieces of this, but I didn’t have all the context. Thank you for summarizing!
I'd say he was wrong here, and his assumption was incorrect.

RFC2181 specifically says 'Where TC is set, the partial RRSet that would not completely fit may be left in the response'

'may be' being the key words. This would mean that it's up to the implementation to decide whether to include any records at all, and many do not.

It sounds like DNS-over-TCP should be supported now: https://wiki.musl-libc.org/functional-differences-from-glibc...

Edit: Hacker News link from last month:

https://news.ycombinator.com/item?id=35964717