Hacker News new | ask | show | jobs
by Arnt 4105 days ago
True, you just need a subset. Now how you you identify that subset without indexing the pages to find out whether each page is in the subset you need?

IIRC google used to scan different pages at very different frequencies. Quite possibly because it has assigns pages into subsets every time it indexes.