Hacker News new | ask | show | jobs
by ChuckMcM 3876 days ago
Well every crawler has to have this list, the Blekko crawler tries to keep these pages out of the index (with varying levels of success). But its not particularly useful for non-crawlers, and since every crawler will have a way of evaluating hosts (possibly uniquely) it isn't really transportable.

That said, if you have ever wondered why domains that used to have web sites on them suddenly become huge spam havens, it is because spammers buy up the domain as soon as it expires and try to exploit its previous reputation as a non-spam site, to push link authority into some (generally Google's) crawl.