| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sangnoir 804 days ago
	There are fewer than 10 links on each domain, how did GPTBot find out about the 1.8M unique sites? By crawling the sites it's not supposed to crawl, ignoring robots.txt. "disallow: /" doesn't mean "you may peek at the homepage to find outbound links that may have a different robots.txt"