Hacker News new | ask | show | jobs
by nocommandline 1143 days ago
Read the announcement and it looks like there isn't an option to submit a site for crawling. If that's true, how do they discover new sites? My understanding of the 'the Web Discovery project' is that they're indexing your search and the results you click, anonymously but you won't see new sites in your search results which in turn means the new site won't be indexed by them
2 comments

If you turn on "the Web Discovery Project" in the Brave browser, then a fraction of the web addresses you visit will be sent to Brave, even if the web pages weren't from a Brave Search SERP.

Source: https://support.brave.com/hc/en-us/articles/4409406835469-Wh...

> If you opt-in to the Web Discovery Project, your browser will process the following data on your device, and securely send it to Brave’s servers:

> - A fraction of the addresses (URLs) of the web pages visited in the Brave Browser, along with engagement metrics (how much time is spent on the page)

> - [...]

> then a fraction of the web addresses you visit will be sent to Brave

I get that but if it's a new site, the number of people visiting will be extremely small if not non-existent. The possibilities that I see are

a) The new website is first noted on something like social media and you found it from there and then accessed it via Brave browser

b) You use Google search or Bing within Brave browser and you find the site (because it was submitted to Google or Bing)

Probably watching for new DNS entries gets you most of the way there. When you fire up any new website you usually get a pile of visits from mysterious cloud boxes in the first 24 hours before you are listed on any search engine. I assume that's how they find you.
How does one watch for new DNS entries? I was under the assumption that iterating the contents of a DNS zone isn't desirable now so is usually disabled/deprecated.
Not a DNS expert but I believe you can get regularly updated copies of zone files from ICANN. To get domains not under their observation I expect you can go out and make deals with registrars on an individual basis. For an individual or small organisation, probably easier would be to subscribe to an API that does it.