Hacker News new | ask | show | jobs
by nevermore 4679 days ago
Please note that this API does not make any specific attempts to obey the mediawiki etiquette (http://www.mediawiki.org/wiki/API:Etiquette). This sort of API is easy and clean for something like a command line script, but if you're going to do further automation or crawling I strongly recommend using the pywikipediabot library (http://www.mediawiki.org/wiki/Manual:Pywikipediabot) which includes a very full API, has tunable throttling, and makes a more direct attempt to require a user agent string that is in line with the api etiquette.

If you just want a bash script to look things up on wikipedia, you can always use something like

function wp { curl "http://en.wikipedia.org/wiki/$(echo "$@" | tr ' ' '_')" | gunzip | html2text }

which will work for basic queries (needs url encoding and words to be properly capitalized).

A full api reference is here (http://en.wikipedia.org/w/api.php).

3 comments

Hi (creator here),

Thanks for bringing this to my attention. I've added a disclaimer to the GitHub page regarding Pywikipediabot and plan to make changes to fully comply with MediaWiki API etiquette. The last thing I'd want to do is inadvertently cause problems for the site or foundation.

I would go one step further and suggest people that need structured queries use the Google BigTable API to query their structured Wikipedia data. Granted, their public dataset is from 2010, so is slightly outdated, but you can write structured SQL against all of the wikipedia article metadata and then use the mediawiki api itself to grab only the article text that you're interested in.

The wikipedia data is hosted here: https://bigquery.cloud.google.com/table/publicdata:samples.w...

Here is a sample query, searching for all articles that start with Positive:

SELECT id,title FROM [publicdata:samples.wikipedia] WHERE (REGEXP_MATCH(title,r'^Positive*')) LIMIT 10

Query complete (2.0s elapsed, 9.13 GB processed

  1|	464347|	Positive airway pressure	 
  2|	10008223|	Positive behavior support	 
  3|	464347|	Positive airway pressure	 
  4|	1354851|	Positivism in Poland	 
  5|	1023857|	Positive set theory	 
  6|	5154273|	Positivism dispute	 
  7|	2871407|	Positivism	 
  8|	17179765|	Positive psychological capital	 
  9|	9033239|	Positive Action Group	 
  10|	4163012|	Positive K
Here is the python API documentation: https://developers.google.com/api-client-library/python/
> If you just want a bash script to look things up on wikipedia

or for basic description:

    wp() { dig +short txt "$*".wp.dg.cx; }
+1 thank you