Hacker News new | ask | show | jobs
by decad 5038 days ago
This is very interesting, although anyone aiming to crawl Wikipedia should make sure they read this section on the Database download page. http://en.wikipedia.org/wiki/Wikipedia:Database_download#Why...

Everything should be fine as long as you respect their 1 request per second rule and their robots.txt

1 comments

A quick skim of the source shows that rate-limiting is not implemented, and the code is non-compliant with Wikipedia's crawling rules.
Thanks for the heads up. I'll add rate-limiting directly into the API.