| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tedunangst 1049 days ago
	How does Wikipedia manage to remain indexed?

7 comments

pessimizer 1049 days ago

Google is paying Wikipedia through "Wikimedia Enterprise." If Wikipedia weren't able to sucker people into thinking that they're poverty-stricken, Google would probably prop it up like they do Firefox.

link

sznio 1049 days ago

Google search still prefers to give me at least 2-3 blogspam pages before the Wikipedia page with exactly the same keywords in the title as my query.

link

lkbm 1049 days ago

If I were establishing a "crawl budget", it would be adjusted by value. If you're consistently serving up hits as I crawl, I'll keep crawling. If it's a hundred pages that will basically never be a first page result, maybe not.

Wikipedia had a long tail of low-value content, but even the low-value content tends to be among the highest value for its given focus. e.g., I don't know how many people search "Danish trade monopoly in Iceland", and the Wikipedia article on it isn't fantastic, but it's a pretty good start[0]. Good enough to serve up as the main snippet on Google.

[0] https://en.wikipedia.org/wiki/Danish_trade_monopoly_in_Icela...

link

snowwrestler 1049 days ago

Wikipedia’s strongest SEO weapon is how often wiki links get clicked on result pages, with no return.

They’re just truly useful pages, and that is reflected in how people interact with them.

link

lmm 1049 days ago

Purely speculating, Wikipedia has a huge number of inbound links (likely many more than CNet or even than more popular sites) which crawler allocation might be proportionate to. Even if it only crawled pages that had a specific link from an external site, that would be enough for Google to get pretty good coverage of Wikipedia.

link

skissane 1049 days ago

Very likely Google special-cases Wikipedia

link

ericd 1049 days ago

Your site isn’t worthy of the same crawl budget as Wikipedia.

link