Consider XML entity bombs. You need to explicitly tell your XML parser not to follow the spec to prevent malicious sources of XML from crashing your application. XML also has a lot of room for syntax errors, with many types of tokens and escape rules. JSON, by comparison, does not.
> XML also has a lot of room for syntax errors, with many types of tokens and escape rules. JSON, by comparison, does not.
Parsing JSON is a minefield.
Yellow and light blue boxes highlight the worst situations for applications using the specified parser. Take a look at how a bunch of parsers perform with various payloads: http://seriot.ch/json/pruned_results.png
"JSON is the de facto standard when it comes to (un)serialising and exchanging data in web and mobile programming. But how well do you really know JSON? We'll read the specifications and write test cases together. We'll test common JSON libraries against our test cases. I'll show that JSON is not the easy, idealised format as many do believe. Indeed, I did not find two libraries that exhibit the very same behaviour. Moreover, I found that edge cases and maliciously crafted payloads can cause bugs, crashes and denial of services, mainly because JSON libraries rely on specifications that have evolved over time and that left many details loosely specified or not specified at all."
Parser correctness is irrelevant when you're talking about the ability to be written with few syntax errors. For instance, JSON has one type of string with one set of string escape rules. XML has element names, attribute names, attribute values, text nodes, CDATA content, RCDATA content, and more. And almost all of them have different rules for what they can contain and how they can be used.
By comparison, XML is orders of magnitude more complex than JSON.
Oh, it will be rejected alright. And then you're forced to override the parser, or to manipulate the XML before parsing it because it makes business sense to not have the source fix their XML for some reason.
People and machines are just utterly incapable of outputting valid XML.
While I'd agree that parsing JSON is much easier than XML, it is still not completely trivial as demonstrated by this article: http://seriot.ch/parsing_json.php
Deserializing somebody else's XML to some usable internal data structures generally requires writing serialization/deserialization by hand and it is always a pain in the ass. On the other hand, JSON basic structures map to reasonable internal representations, so I often can simply iterate through the structures coming as-is from the parser library.
I mean, if the same webservice is offering the same data in both XML and JSON format, chances are I'd have to write less code for handling the JSON endpoint. For a client written in e.g. Java both cases may be pretty much equal, but for dynamic languages like Javascript or Python, the difference is significant.
This is a straw man, IMO. Obviously, in production, the actual JSONs will interact very little with humans. But there's still development, debugging, etc.
So you will need to write small cases during development, tweak existing cases, etc.
Also, many tools accept configuration in JSON, which is somewhat convenient to write by hand, and is easily machine readable. Sublime Text comes to mind, for example.
XML generators and parsers have been in use for a decade+. Pretty sure most of the bugs have been found and fixed by now.
It's just reinventing the wheel because the new generation don't want to use the same tools the previous generation did. The time and effort spent doing this is quite ridiculous.
(FWIW, I hate XML, JSON is far better. But there's more important things to work on).
> Pretty sure most of the bugs have been found and fixed by now.
Given the complexity and what I've seen from some other long established codebases, I don't share your confidence.
> It's just reinventing the wheel because the new generation don't want to use the same tools the previous generation did.
You can disagree with the decisions involved (as you did with the XML vulnerability argument), but the fact that those arguments exist means they AREN'T doing it just because they don't want to use the same tools the previous generation did - they have different reasons that you think aren't good reasons.
Saying it as you did comes across as smug and dismissive, which is not an effective way of convincing your audience that you've taken arguments into account when making your decision.
Not the parent but my company consumed a bit of RSS starting in 2005 (and with the amounts declining to 0 through the years).
Over time we've been fed feeds with character encodings not matching what the web server nor the XML declared. Use of undeclared XML namespaces, or quite popular: using elements from other namespaces, without namespaces or declarations -- just shove some nice iTunes things or Atom things into the RSS. Also invalid XML -- just skipping the closing tags was popular.
These feeds were from paying customers, and we were not the primary consumers - so when we complained they would generally point to someone else who was consuming it without problem. Sometimes we'd point them at a validator, if they were a small enough customer -- but mostly we just kept working on our in house RSS feed reader that could read tag soup.
Things did massively improve over time, and that by the end we were getting _mainly_ reasonably valid RSS.