Hacker News new | ask | show | jobs
by zamadatix 19 days ago
They mean anyone will only ever be able to get some rather than all because of how DNS works. Even that site is unable to find all such mappings for every IP. I.e. the first option to build such a dataset is to use rDNS lookups, but it's optional to configure those and many don't bother (opting to just have the forward lookup). The other way about it is collecting domains and seeing which IPs they resolve to, but this is only ever partial (there is no such thig as a conplete lists of domains in the world) and has additional problems with CDNs or things hosted in more than one place at once (you need to do a lookup from every possible location each domain could conceivably be hosted to be sure you get all of the reverse mappings for that domain).

As an example: Unless you know my domains ahead of time you'll never be able to come up with what domains are hosted on my IPs because I don't bother to configure rDNS. So those IPs will look like they host no servers (or only some if you only had a partial list of my domains) rather than all of them on those IPs.

Anyways, for free data sources trying to get a partial view of this you can check out Rapid7's Sonar or Common Crawl. Each should have the pieces needed to construct this kind of view from the data.

1 comments

Sonar does seem to have data here: https://opendata.rapid7.com/

But it seems you've to go through their sales team and all for the data.

I can't find CommonCrawl making their DNS resolution available.

Common Crawl doesn't publish raw DNS separately, you have to pull the information out of the aggregate database. The WARC-IP-Address header should record the IP Common Crawl connected to for the site.
Good timing, I'm about to release that dataset.