| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeffnappi 804 days ago
	His site has a subdomain for every page, and the crawler is considering those each to be unique sites.

2 comments

sangnoir 804 days ago

There are fewer than 10 links on each domain, how did GPTBot find out about the 1.8M unique sites? By crawling the sites it's not supposed to crawl, ignoring robots.txt. "disallow: /" doesn't mean "you may peek at the homepage to find outbound links that may have a different robots.txt"

link

jameshart 804 days ago

Of course it’s considering them as unique sites. They are unique sites.

link