|
|
|
|
|
by gojomo
4695 days ago
|
|
Yes and no. Some crawlers are most interested in freshest versions of the most inlinked articles, or in the exact HTML presentation at Wikipedia. The monthly full raw wikitext dumps don't provide that. And, Wikipedia's serving plant is pretty efficient, with bandwidth only being a small portion of their costs. They can afford some crawling... and correspondingly, their /robots.txt is pretty open. Good crawlers seeking just the bulk text shouldn't try to grab the whole thing as fast as possible via the standard web URLs... but other good crawlers may want or need to visit discovered Wikipedia links, and doing so at a measured pace should be OK. |
|