Hacker News new | ask | show | jobs
by tedunangst 1049 days ago
How does Wikipedia manage to remain indexed?
7 comments

Google is paying Wikipedia through "Wikimedia Enterprise." If Wikipedia weren't able to sucker people into thinking that they're poverty-stricken, Google would probably prop it up like they do Firefox.
Google search still prefers to give me at least 2-3 blogspam pages before the Wikipedia page with exactly the same keywords in the title as my query.
If I were establishing a "crawl budget", it would be adjusted by value. If you're consistently serving up hits as I crawl, I'll keep crawling. If it's a hundred pages that will basically never be a first page result, maybe not.

Wikipedia had a long tail of low-value content, but even the low-value content tends to be among the highest value for its given focus. e.g., I don't know how many people search "Danish trade monopoly in Iceland", and the Wikipedia article on it isn't fantastic, but it's a pretty good start[0]. Good enough to serve up as the main snippet on Google.

[0] https://en.wikipedia.org/wiki/Danish_trade_monopoly_in_Icela...

Wikipedia’s strongest SEO weapon is how often wiki links get clicked on result pages, with no return.

They’re just truly useful pages, and that is reflected in how people interact with them.

Purely speculating, Wikipedia has a huge number of inbound links (likely many more than CNet or even than more popular sites) which crawler allocation might be proportionate to. Even if it only crawled pages that had a specific link from an external site, that would be enough for Google to get pretty good coverage of Wikipedia.
Very likely Google special-cases Wikipedia
Your site isn’t worthy of the same crawl budget as Wikipedia.