Hacker News new | ask | show | jobs
by Waterluvian 745 days ago
Is there some 80/20 rule for web indexing?

I’m not saying having deep per-page indexing of Reddit, for example, isn’t useful. But is there any value in a breadth-focused index that is far cheaper to maintain?

1 comments

Almost certainly. Internet search is above all a problem of improving the signal to noise ratio.

There's an inordinate amount of documents that will never be a good search result for any query. Both in trivial cases that have barely anything to index in them, but also sign-up forms, cookie policies, redundant information (e.g. any given man page exists in dozens if not hundreds of identical copies on the web).

> cookie policies

Unless you're specifically searching for other websites' cookie policies (e.g. to understand how they work, or to do research on them, or just to plainly copy them...)