| HN Mirror

For a lot of XML you need to be able to support XML namespacing, and doing that adds a lot of complexity over the original pure XML.

At first XML namespacing sounds simple. Each tag and attribute will have an optional uri attached to it, no big deal right?

From reading through the specification one could be forgiven from assuming that the prefixes are just arbitrary mappings that a processor can ignore, or automatically remap to alternate prefixes.

For example, it is true that <abc:a xmlns:abc="https://example.com/xyz" xmlns:def="https://example.com/xyz"><def:b>5</def:b></abc:a> (notice both namespaces are the same url) is equivalent to: <a xmlns="https://example.com/xyz"><b>5</b></a>.

Unfortunately, the data model also allows for content to reference the namespaces by prefix, and therefore every general xml processor that supports namespaces must keep around an application accessible mapping from the prefixes to namespaces, as the application may need to be able to access that information to interpret attributes or content. The only exception to this would be if the general XML processor insisted on having schema information for every namespace it might come across. In that scenario it would be able to tell if an attribute value of "abc:b" is really a string literal, or a reference to a namespace identifier (QNAME data type), where the namespace is whatever the current "abc" prefix is bound to, and the identifier is "b".

But obviously we don't want to add full schema support for a simple implementation, so we need to keep the mapping information around, just in case the application needs it. We also cannot easily offer nice features like changing a document to use preferred prefixes for certain namespaces, unless we also keep any prefixes that are used in values that could be interpreted as QNAMES, just in case they actually are, but our processor does not know, because it has omitted schema support for simplicity (or perhaps it included schema support, but does not have a schema available for some namespace).

And that is just the complexity that stems from one fairly small quirk in how XML works.

You also have no idea if an element content needs to preserve whitespace or not if you don't know the schema, and don't happen to have an xml:whitespace attribute present. Thus if you want to re-indent arbitrary xml for readability safely you could end up with something like this:

    <abc
        ><def
            >5</def
        ></abc
    >