Hacker News new | ask | show | jobs
by bastawhiz 3327 days ago
Consider XML entity bombs. You need to explicitly tell your XML parser not to follow the spec to prevent malicious sources of XML from crashing your application. XML also has a lot of room for syntax errors, with many types of tokens and escape rules. JSON, by comparison, does not.
2 comments

> XML also has a lot of room for syntax errors, with many types of tokens and escape rules. JSON, by comparison, does not.

Parsing JSON is a minefield.

Yellow and light blue boxes highlight the worst situations for applications using the specified parser. Take a look at how a bunch of parsers perform with various payloads: http://seriot.ch/json/pruned_results.png

"JSON is the de facto standard when it comes to (un)serialising and exchanging data in web and mobile programming. But how well do you really know JSON? We'll read the specifications and write test cases together. We'll test common JSON libraries against our test cases. I'll show that JSON is not the easy, idealised format as many do believe. Indeed, I did not find two libraries that exhibit the very same behaviour. Moreover, I found that edge cases and maliciously crafted payloads can cause bugs, crashes and denial of services, mainly because JSON libraries rely on specifications that have evolved over time and that left many details loosely specified or not specified at all."

More details available at: http://seriot.ch/parsing_json.php

None of these issues are as bad as the XML ones. You generally don't need "defusedjson" like you need https://pypi.python.org/pypi/defusedxml

<!DOCTYPE external [ <!ENTITY ee SYSTEM "file:///etc/ssh/ssh_host_ed25519_key"> ]> <root>&ee;</root>

Parser correctness is irrelevant when you're talking about the ability to be written with few syntax errors. For instance, JSON has one type of string with one set of string escape rules. XML has element names, attribute names, attribute values, text nodes, CDATA content, RCDATA content, and more. And almost all of them have different rules for what they can contain and how they can be used.

By comparison, XML is orders of magnitude more complex than JSON.

> XML also has a lot of room for syntax errors,

No it doesn't. XML is either well formed or not, and any parser encountering non well-formed XML will reject it outright.

Therefor all XML in use on the internet is spec-compliant.

Now try to say the same about JSON.

> any parser encountering non well-formed XML will reject it outright.

Ah, I see you're new to parsing XML.

Oh, it will be rejected alright. And then you're forced to override the parser, or to manipulate the XML before parsing it because it makes business sense to not have the source fix their XML for some reason.

People and machines are just utterly incapable of outputting valid XML.