Hacker News new | ask | show | jobs
by jeffnappi 804 days ago
His site has a subdomain for every page, and the crawler is considering those each to be unique sites.
2 comments

There are fewer than 10 links on each domain, how did GPTBot find out about the 1.8M unique sites? By crawling the sites it's not supposed to crawl, ignoring robots.txt. "disallow: /" doesn't mean "you may peek at the homepage to find outbound links that may have a different robots.txt"
Of course it’s considering them as unique sites. They are unique sites.