Hacker News new | ask | show | jobs
by agolliver 1786 days ago
Not sure a better place to ask this, but we're getting to the point in most OSs where basically anything you want can finally be done truly async (non-blocking).

Except that DNS apparently still blocks[0], so usually things farm DNS requests off to their own blocking thread pools (the author ends up disabling DNS just so they can "prove" everything works with just a single thread)

What's so fundamentally difficult about writing an async/non-blocking DNS resolver? Is it just a lack of a real need for it?

[0] (quote: People have been trying to build asynchronous DNS resolvers for decades without success. Can it be done? Yes. Has it been done? No. ) https://gist.github.com/djspiewak/46b543800958cf61af6efa8e07...

3 comments

I have found this comment on liburing[0] which clarifies some things for me. So is it because something like getaddrinfo is not just a simple operation, but a hodgepodge of stuff like "first try nscd, then read resolv.conf, then make a bunch of requests using one of many different protocols"?

I think I have a lot more to learn about DNS...

[0] https://github.com/axboe/liburing/issues/26#issuecomment-738...

> [re: implementing getaddrinfo] It's not planned because it's not a single operation but a complex beast reading files, exchanging messages, etc. IMHO, as it's not a single operation it fundamentally doesn't fit liburing, but implementing under some framework on-top would be more viable.

There is real need, and there are solutions.

NodeJS has a built-in DNS resolution API which is fully async / libuv-cooperative: https://nodejs.org/api/dns.html#dns_dns_resolve_hostname_rrt...

Other async runtimes bundle libs like c-ares to implement non-threadpool-based async DNS.

quoting the nodejs docs: "These functions are implemented quite differently than dns.lookup(). They do not use getaddrinfo(3) and they always perform a DNS query on the network. This network communication is always done asynchronously, and does not use libuv's threadpool."

On the OS level there is no difference between "sync" and "async". They become different in programming language runtimes.
Not really. mkdir(2) is definitely sync, io_uring has been called asynchronous since the start, and its predecesor has asynchronous in the name: https://lwn.net/Articles/776703/
There is a difference between blocking and non-blocking, which is what I am asking about. Language runtimes apparently cannot provide non-blocking DNS (as evidenced by this article, and the other one I linked which was on HN a week or so ago).

DNS seems like it perfectly fits the model of "give me a buffer (to hold the resolved IP), and I will wake you up when it's filled". Maybe io_uring (/ IOCP?) already can support this, and these articles are just mistaken about the current (or soon-to-be) state-of-the-art? Or is there some fundamental reason about DNS that make writing a non-blocking resolver very very difficult?

It's just very odd to me that userspace apps are creating their own blocking thread pools just to run DNS stuff, when they can do seemingly everything else with just a single thread if they wanted to.

(edited a few times, sorry if I caught someone who was mid-reply)

> There is a difference between blocking and non-blocking

Not really.

"Blocking" means "tell the scheduler to mark the process idle until operation finishes", while "non-blocking" means "don't mark the process but flip a semaphore bit when operation finishes".

The point of view of the OS the difference is just which bit to flip.

On windows especially i/o is usually async, the sync apis serialize all the i/o on the file object(even across threads).

It has weird consequences, e.g. you can't use a named pipe in 2 threads doing read, write separately(without doing protcol level coordination say in a proxy). You need to use asynchornous i/o to get it right.