Hacker News new | ask | show | jobs
by 1vuio0pswjnm7 1692 days ago
I will sometimes start with a zone file, like com.zone. I search for keywords in registered domainnames. Then I filter by nameserver (registrar). Finally, I run a script to fetch the page titles. You would be surprised at how effective this can be in finding websites that you would never be able to find using simple Google searches. Of course, it is cumbersome; search engines can make this process very easy but they deliberately disable this type of exploration. You can query Google's index for a list of all websites with domain names that contain a certain keyword, but you will never be able to retrieve the full list of results, and certainly not in a "neutral" order such as alphabetical.

Arguably a web comprising a large number small, diverse websites, where each user may be visiting a variety of different websites, is less suitable for advertising than one where all web users are funneled through a few large websites that survive by selling online ad services, like Google. It stands to reason that those large, online ad services sites would have little interest in showing users an undiscovered portion of the web. They want users to congregate on "popular" sites. Good for advertising.

OTOH, using zone files instead of a search engine, social media or news aggregator site in the online ads (or VC) business, one can see all websites that have registered an ICANN domain name. No filters. No advertising-related algorithms. Popularity is irrelevant. The user determines relevance, not a third party.

2 comments

Is this something that could be scripted with some foo to be a basic search engine in itself?

Anyone know if something like this exists?

What I have wanted to do for some time, well over a decade, is to create a search engine that just searches page titles. Not as a substitute for any other search engine but as a high throughput discovery tool to screen for websites which can then be explored and searched further.

There are, e.g., search engines that search for strings in the page source, e.g., to detect use of certain Javascript files. These are slow and not free.

Wouldn't the Google intitle/allintitle search operators work for this? Or am I missing something?
How can I get a zone file?
Today it is easy to create zone files from free, publicly available internet scan data, e.g., scans.io and censys.io. This is arguably a better solution than zone files. Not every domain name in a zone file necesarily corresponds to a website. Whereas scans allow to focus only on websites.

Requesting zone files from the registry was the traditional method. ICANN tries to require registries to provide them to the public, with limited success. Downloading com.zone/net.zone from Verisign should be relatively straightforward (not sure if edu.zone is available anymore). However with gTLDs there are hundreds of registries now, with potentially hundreds of different rules on zone file access; some registries like ccTLDs never had zone file access programs. Even registries that seem like they would be easy to deal with can have silly restrictions, e.g., the .org registry used to have a requirement that the requester needed to have stable IP address.

s/stable/static/
You can request a copy from ICANN through the Centralized Zone Data Service (CZDS). It's a pretty neat service and will give you access to zone files for a few months, you just need to file a request to one/multiple TLDs you are interested in seeing.

Sometimes larger TLDs take a bit longer to respond to requests, whereas some others automatically accept all requests.

https://www.icann.org/resources/pages/czds-2014-03-03-en https://czds.icann.org/home