|
|
|
|
|
by pointlessone
1168 days ago
|
|
I guess, it depends on how you define XML baseline. You can have a very simple XML with only bare tags. It will work just fine. Arguably, it's even simpler than JSON that way. A basic parser for that it probably not more complex than a JSON parser. All the optional complexity that can go on top, though, is probably better specified for XML. Transformation is well defined for XML (XSLT) but not at all for JSON (I guess, you write your own code to manipulate native objects). Schemas are basically a native feature for XML. Not so much for JSON. All sorts of specialised vocabularies are defined for XML. A few are defined for JSON, too. |
|
At first XML namespacing sounds simple. Each tag and attribute will have an optional uri attached to it, no big deal right?
From reading through the specification one could be forgiven from assuming that the prefixes are just arbitrary mappings that a processor can ignore, or automatically remap to alternate prefixes.
For example, it is true that <abc:a xmlns:abc="https://example.com/xyz" xmlns:def="https://example.com/xyz"><def:b>5</def:b></abc:a> (notice both namespaces are the same url) is equivalent to: <a xmlns="https://example.com/xyz"><b>5</b></a>.
Unfortunately, the data model also allows for content to reference the namespaces by prefix, and therefore every general xml processor that supports namespaces must keep around an application accessible mapping from the prefixes to namespaces, as the application may need to be able to access that information to interpret attributes or content. The only exception to this would be if the general XML processor insisted on having schema information for every namespace it might come across. In that scenario it would be able to tell if an attribute value of "abc:b" is really a string literal, or a reference to a namespace identifier (QNAME data type), where the namespace is whatever the current "abc" prefix is bound to, and the identifier is "b".
But obviously we don't want to add full schema support for a simple implementation, so we need to keep the mapping information around, just in case the application needs it. We also cannot easily offer nice features like changing a document to use preferred prefixes for certain namespaces, unless we also keep any prefixes that are used in values that could be interpreted as QNAMES, just in case they actually are, but our processor does not know, because it has omitted schema support for simplicity (or perhaps it included schema support, but does not have a schema available for some namespace).
And that is just the complexity that stems from one fairly small quirk in how XML works.
You also have no idea if an element content needs to preserve whitespace or not if you don't know the schema, and don't happen to have an xml:whitespace attribute present. Thus if you want to re-indent arbitrary xml for readability safely you could end up with something like this: