Hacker News new | ask | show | jobs
by fox91 4598 days ago
Why would I need the dump in JSON? XML is for sure uglier but at least there are efficient SAX parsers.
1 comments

I would like to know this as well. What is gained by using this tool... converting to JSON?
JSON is cool and hip.

More seriously, it has been gaining popularity because of it's direct relationship to built-in structures in most programming languages.

For example: XML doesn't have a concept of arrays, but accomplishes storing collections in it's own special way. It's up to the parser to determine how this should be referenced and interacted-with, while with JSON there's no question: it's an array.

For example, now I know for a fact I can easily manipulate Wikipedia's data dump using Python thanks to this JSON change.

    import json

    js = json.loads(wikidump)
That's it. The only reason it's so simple is thanks to this JSON conversion.

XML? No clue how to manipulate it properly. I could figure it out of course, but now I don't have to. JSON + Python just works™.

yes, and also it makes easy to manipulate the dump using Hadoop, and retrieving some values for all the article using jq http://stedolan.github.io/jq/, could you do the same with XML? Btw You can find a little sample of the dump here: https://dl.dropboxusercontent.com/u/4663256/tmp/json-wikiped...